riboviz / example-datasets

Example datasets to run with RiboViz
Apache License 2.0
2 stars 7 forks source link

Add new dataset Thermotolerans #120

Open acope3 opened 1 year ago

acope3 commented 1 year ago

Thanks for starting to add a new dataset to example-datasets! This issue template includes the key steps, see add-new-dataset.md. Please edit as needed for your dataset.

acope3 commented 1 year ago

I have added the annotation files. @davbunn1 @HannahMaroof, I see that you both were assigned to the issue for creating the L. kluyveri dataset, so I have assigned you to this issue, as well. Branch is cope-thermotolerans-120.

davbunn1 commented 1 year ago

Working in branch: "cope-thermotolerans-120" Genus folder within example-datasets: "thermotolerans" Strain: Lachancea thermotolerans Y-8284 (existing L. thermotolerans contaminants data on example-datasets from L. kluyveri) Data source: EIRNA BIO in 2023 (James Keane, Darren Fenton and others) Transcriptome annotation courtesy of Alex Cope @acope3

davbunn1 commented 1 year ago

Contaminants fasta file created from ncbi data as detailed in provenance file UMIs (N) and Barcodes (B) used:

Read structure: NNNN - rpf sequence - NNNNN - BBBBB – Adapter Barcodes: Rep1 – ATCGT, Rep2 – AGCTA, Rep3 - CGTAA Adapter sequence: AGATCGGAAGAGCACACGTCTGAA

config.yaml file created (EIRNA_2023_LT_3-samples_cds_250nt_utr_config.yaml) and successfully run on full-sized dataset.

check_fasta_gff pending - currently getting the error: no module named 'pyfaidx'. This persists even after using 'source activate riboviz' which I had thought would load the necessary modules...

davbunn1 commented 1 year ago

Results from the full-sized test run look excellent. Just waiting on check_fasta_gff before putting in a pull request