Closed jsbrittain closed 3 months ago
Docstring currently reads:
Subsampler (Anderson Brito)
A pipeline for subsampling genomic data based on epidemiological time series data. This pipeline was developed by Anderson Brito and is available at: https://github.com/andersonbrito/subsampler/tree/master.
Ports: data: Input data (metadata and case data) config: Configuration files (keep, remove and filter files), default none if unconnected
Params:
Metadata file (str): Name of the metadata file
Case data file (str): Name of the case data file
Subsampler:
keep file (str): File designating sequences to keep. This should be a plain-text (.txt) file containing one ID (e.g. EPI_ISL_402125) per line. Leave blank for none.
remove file (str): File designating sequences to remove. This should be a plain-text (.txt) file containing one ID (e.g. EPI_ISL_12345678) per line. Leave blank for none.
filter file (str): File containing sequence filters. This should be a tab-delimited file (.tsv) with columns action
, column
and value
. action
can contain "exclude", the column
should contain the name of the column to filter on, and the value
column should contain the value to filter on. Leave blank for none.
ID Column (str): Column name for the ID
GEO Column (str): Column name for the geographic location
Date column (str): Column name for the date
Baseline (decimal):
Reference genome size (int): Reference genome size
Max missing (int): Maximum number of missing data
Seed number (int): Seed
Start date (str): Start date
End date (str): End date
Unit (str): Time unit
@joetsui1994 Could you take a look at the docstring in particular to ensure that it provideds sufficient information for the end user to run the module. Thanks!
@jsbrittain Looks great! I just tested it and it works.
I would suggest a few changes in the docstring: 1) Give either an example for start/end date, or the required format (seems to be YYYY-MM-DD)? 2) What are the options for time unit? (seems to be ['week', 'month', 'year', 'full']) 3) change "Baseline" to "Baseline sampling proportion" 4) "Max" to "Max."
Otherwise this looks good to me!
Add the Anderson Brito subsampler (https://github.com/andersonbrito/subsampler)
To test the module:
git@github.com:jsbrittain/vneyard.git
) and checkout this branch (git checkout sampler_brito
) [this will not be necessary once the module is added to the repository].vneyard/
) as a local repository.Subsampler_Brito
into the scene from theEPI
project. Link a local folder as input usingLinkLocalFolder
(from theUtility
project). You can use the sample data provided invneyard/workflows/EPI/modules/Subsampler_Brito/results/data
; do the same with theconfig
folder (note that if you remove the subsampler config filenames from the module then it will run with defaults).Build and Run
the workflow with the existing default options (or modify and build).Open Results
to see the output; see below for a screenshot of the workflow with results.Screenshot:
resolves #2
Improvements that can be made: