Open acope3 opened 1 year ago
I have added the annotation files. @davbunn1 @HannahMaroof, I see that you both were assigned to the issue for creating the L. kluyveri dataset, so I have assigned you to this issue, as well. Branch is cope-thermotolerans-120
.
Working in branch: "cope-thermotolerans-120" Genus folder within example-datasets: "thermotolerans" Strain: Lachancea thermotolerans Y-8284 (existing L. thermotolerans contaminants data on example-datasets from L. kluyveri) Data source: EIRNA BIO in 2023 (James Keane, Darren Fenton and others) Transcriptome annotation courtesy of Alex Cope @acope3
Contaminants fasta file created from ncbi data as detailed in provenance file UMIs (N) and Barcodes (B) used:
Read structure: NNNN - rpf sequence - NNNNN - BBBBB – Adapter Barcodes: Rep1 – ATCGT, Rep2 – AGCTA, Rep3 - CGTAA Adapter sequence: AGATCGGAAGAGCACACGTCTGAA
config.yaml file created (EIRNA_2023_LT_3-samples_cds_250nt_utr_config.yaml) and successfully run on full-sized dataset.
check_fasta_gff pending - currently getting the error: no module named 'pyfaidx'. This persists even after using 'source activate riboviz' which I had thought would load the necessary modules...
Results from the full-sized test run look excellent. Just waiting on check_fasta_gff before putting in a pull request
Thanks for starting to add a new dataset to example-datasets! This issue template includes the key steps, see add-new-dataset.md. Please edit as needed for your dataset.
cheng-entamoeba-123
if the dataset were generated by Dr. Cheng, from entamoeba, and the new issue ticket is number 123.check_fasta_gff
.