nmdp-bioinformatics / pipeline

Consensus assembly and allele interpretation pipeline.
GNU Lesser General Public License v3.0
7 stars 7 forks source link

Would be nice if tutorial integrated minimal test data #51

Closed gturenchalk closed 9 years ago

gturenchalk commented 9 years ago

For new users attempting the tutorial for the first time, it would be nice if it integrated some minimal test data as part of the process.

ckennedy-nmdp commented 9 years ago

Hi Greg, I agree. Our attempt at this was the single-sample SRA data. Were you thinking something else?

gturenchalk commented 9 years ago

When running "prep_data_dir.bash" prior to looking at the code, I had a naive expectation that tutorial data would have been part of the github content and would have been automatically available to the script relative to a git clone.

I see that the script has a variable that can be set to point to the data:

TUTORIAL_DIR=/opt/data/tutorial/fastq

I'm not really picky about the data itself; it depends on how complex the issues are you would like to demonstrate with the tutorial. I would be satisfied with a simple example that has enough content to survive the process and produce an expected result at the end of the process. Single sample SRA data sounds fine to me, but unless I missed something, I didn't see any data as part of the github content.

If your primary use case actually presumes the user will provide their own data, and you don't want to make tutorial data automatically install, it still might be nice to provide a flag for the script to trigger the install of the data.

ckennedy-nmdp commented 9 years ago

Compressed simulated read data from IMGT-HLA references provided in tutorial/raw. Will update the tutorial for end-to-end smoke testing.