In response to #68 , I've patched the input files to include the ".gz" extension, here.
Also discussed in that issue, I think the default behavior when a user does not provide the --sample_grouping_col (e.g. if no sample replicates exist), then the workflow should simply skip aggregating samples. It seems simply setting the default value for that parameter as "sample_id" as the default value won't work either, AFAICT because sample_id is used as the index for locating samples when you shard the aggregate_organisms step. Can you think of an easy way to skip sample aggregation?
P.S. I'm not sure if you have a Virscan testing/dev set, but I'm just running
@sminot
In response to #68 , I've patched the input files to include the ".gz" extension, here.
Also discussed in that issue, I think the default behavior when a user does not provide the
--sample_grouping_col
(e.g. if no sample replicates exist), then the workflow should simply skip aggregating samples. It seems simply setting the default value for that parameter as "sample_id" as the default value won't work either, AFAICT because sample_id is used as the index for locating samples when you shard the aggregate_organisms step. Can you think of an easy way to skip sample aggregation?P.S. I'm not sure if you have a Virscan testing/dev set, but I'm just running
which runs the default pan-CoV-example data data. I think this should be fine?
All advice (/ pushes to this branch) welcomed and appreciated!