karel-brinda / rnftools

RNF framework for NGS: simulation of reads, evaluation of mappers, conversion of RNF-compliant data.
http://karel-brinda.github.io/rnftools
MIT License
14 stars 5 forks source link

Wrong read name - more than 2 reads in tuple? #77

Open rsuchecki opened 5 years ago

rsuchecki commented 5 years ago

I am using version 0.3.1.3 and I am getting some puzzling errors. Why would there be >2 reads in a tuple in this case?

Error message:

Error in rule 2:
      jobid: 0
      output: A_thaliana_TAIR10_chr1_with_gff_ArtIllumina_reads.1.fq, A_thaliana_TAIR10_chr1_with_gff_ArtIllumina_reads.2.fq

  RuleException:
  ValueError in line 32 of /usr/local/lib/python3.6/site-packages/rnftools/mishmash/mishmash.snake:
  Wrong read name '__03__(01,02,F,000000512,000000611),(01,02,R,000000701,000000800),(01,02,F,000000877,000000976),(01,02,R,000001055,000001154)__[art-illumina,reads-in-tuple:4]/3'.
    File "/usr/local/lib/python3.6/site-packages/rnftools/mishmash/mishmash.snake", line 32, in __rule_4
    File "/usr/local/lib/python3.6/site-packages/rnftools/mishmash/Sample.py", line 90, in create_fq
    File "/usr/local/lib/python3.6/site-packages/rnftools/rnfformat/FqMerger.py", line 132, in run
    File "/usr/local/lib/python3.6/site-packages/rnftools/rnfformat/FqMerger.py", line 251, in save_read
    File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
  Exiting because a job execution failed. Look above for error message

Snakefile:

import rnftools
rnftools.mishmash.sample("A_thaliana_TAIR10_chr1_with_gff_ArtIllumina_reads",reads_in_tuple=2)
rnftools.mishmash.ArtIllumina(
        fasta="A_thaliana_TAIR10_chr1_with_gff.transcripts.fa",
        rng_seed=1,
        coverage=0.1,
        distance=300,
        distance_deviation=50,
        read_length_1=100,
        read_length_2=100
)
include: rnftools.include()
rule: input: rnftools.input()

Input FASTA:

>mRNA0010363005899
AAATTATTAGATATACCAAACCAGAGAAAACAAATACATAATCGGAGAAATACAGATTACAGAGAGCGAGAGAGATCGACGGCGAAGCTCTTTACCCGGAAACCATTGAAATCGGACGGTTTAGTGAAAATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTATCTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTGAAGTAGCCATCAGCGAGGTCAACATCTGTAGCTACGATCCTTGGAACTTGCGCTGTAAGTTCCGAATTTTCTGAATTTCATTTGCAAGTAATCGATTTAGGTTTTTGATTTTAGGGTTTTTTTTTGTTTTGAACAGTCCAGTCAAAGTACAAATCGAGAGATGCTATGTGGTACTTCTTCTCTCGTAGAGAAAACAACAAAGGGAATCGACAGAGCAGGACAACGGTTTCTGGTAAATGGAAGCTTACCGGAGAATCTGTTGAGGTCAAGGACCAGTGGGGATTTTGTAGTGAGGGCTTTCGTGGTAAGATTGGTCATAAAAGGGTTTTGGTGTTCCTCGATGGAAGATACCCTGACAAAACCAAATCTGATTGGGTTATCCACGAGTTCCACTACGACCTCTTACCAGAACATCAGGTTTTCTTCTATTCATATATATATATATATATATATGTGGATATATATATATGTGGTTTCTGCTGATTCATAGTTAGAATTTGAGTTATGCAAATTAGAAACTATGTAATGTAACTCTATTTAGGTTCAGCAGCTATTTTAGGCTTAGCTTACTCTCACCAATGTTTTATACTGATGAACTTATGTGCTTACCTCCGGAAATTTTACAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTGATGATGCGGACATTCTATCTGCTTATGCAATAGATCCCACTCCCGCTTTTGTCCCCAATATGACTAGTAGTGCAGGTTCTGTGGTGAGTCTTTCTCCATATACACTTAGCTTTGAGTAGGCAGATCAAAAAAGAGCTTGTGTCTACTGATTTGATGTTTTCCTAAACTGTTGATTCGTTTCAGGTCAACCAATCACGTCAACGAAATTCAGGATCTTACAACACTTACTCTGAGTATGATTCAGCAAATCATGGCCAGCAGTTTAATGAAAACTCTAACATTATGCAGCAGCAACCACTTCAAGGATCATTCAACCCTCTCCTTGAGTATGATTTTGCAAATCACGGCGGTCAGTGGCTGAGTGACTATATCGACCTGCAACAGCAAGTTCCTTACTTGGCACCTTATGAAAATGAGTCGGAGATGATTTGGAAGCATGTGATTGAAGAAAATTTTGAGTTTTTGGTAGATGAAAGGACATCTATGCAACAGCATTACAGTGATCACCGGCCCAAAAAACCTGTGTCTGGGGTTTTGCCTGATGATAGCAGTGATACTGAAACTGGATCAATGGTAAGCTTTTTTTACTCATATATAATCACAACCTATATCGCTTCTATATCTCACACGCTGAATTTTGGCTTTTAACAGATTTTCGAAGACACTTCGAGCTCCACTGATAGTGTTGGTAGTTCAGATGAACCGGGCCATACTCGTATAGATGATATTCCATCATTGAACATTATTGAGCCTTTGCACAATTATAAGGCACAAGAGCAACCAAAGCAGCAGAGCAAAGAAAAGGTTTAACACTCTCACTGAGAAACATGACTTTGATACGAAATCTGAATCAACATTTCATCAAAAAGATTTAGTCAAATGACCTCTAAATTATGAGCTATGGGTCTGCTTTCAGGTGATAAGTTCGCAGAAAAGCGAATGCGAGTGGAAAATGGCTGAAGACTCGATCAAGATACCTCCATCCACCAACACGGTGAAGCAGAGCTGGATTGTTTTGGAGAATGCACAGTGGAACTATCTCAAGAACATGATCATTGGTGTCTTGTTGTTCATCTCCGTCATTAGTTGGATCATTCTTGTTGGTTAAGAGGTCAAATCGGATTCTTGCTCAAAATTTGTATTTCTTAGAATGTGTGTTTTTTTTTGTTTTTTTTTCTTTGCTCTGTTTTCTCGCTCCGGAAAAGTTTGAAGTTATATTTTATTAGTATGTAAAGAAGAGAAAAAGGGGGAAAGAAGAGAGAAGAAAAATGCAGAAAATCATATATATGAATTGGAAAAAAGTATATGTAATAATAATTAGTGCATCGTTTTGTGGTGTAGTTTATATAAATAAAGTGATATATAGTCTTGTATAAG
>mRNA0010678708737
TTAGTAAGGTCTAATTCAATTTTTGGTGGCGATAATATTTGGCTTAGTCATAAAATACAGTATGGTATAATAATGTAAAGGTTTCTCTTATCTTCAAACCAAAAGACTATACTGGAAGCTGATGGGATCATACGATTCTGAAAAAATAAGACATATATTGCAACAGAGATCCAATTTGTATCAAAAATATTGTCGGCTCAAAAATCTGACCCACCAAGAATCTAATCAAGTGCGCGATTAAGCATACGGCTATGCATCTGGTCATTGTTGATTCAGTCATCACTGGTTTAAAGACAAACTTGCATTGTGAGATTCCAAAATAACAACAACAAAAAACAATTTGCATTGAGAACATTTTGAAGGTCTGACCTTTAAGAGCCATGGAGTTTGATGTTAAGAGAAGTATATCGACAAAAAAAATCACTGACATTGGGAATTCCCATACCTGTAATAACAAAGATTCTCTATTTTTGAGCAAAGAGACATAACACCATGTTTAATCAATACAAGAAACTTTAGTGCATGATTTATGGAATGCTTAAGAAGTTTGGAACTTCAATATTAGGAATTAAGTGAGAGGTAAAGCTACAACATACCAACATCGCAAGCAGAAATATCTTGAAGTAACTAGAGATGAATATCCCCAACATAATCTCTCTTCTTCTGCAAAGTTTTTAAAAAAAATTATACATAAACAATCTTCAGGTGACAGAAGTCTGAGATCTTTGATGAAAAACTCATAATAAGAACTAAGAAGAAGAAGAATCAGTAATCACCTGGAAACTTCATTTAGCAAACCCTTAGTCGCAATGGCAAAAGAGATGATAAATGCAGCGTTTGCAGATAAGACACCAATCAGAACCTGTTTCAAATGCGAAATTATTACCCTTTCTAAACAATCTCAATGACTTAAATCATTTAAACCTTAAAGGAAAAAAAATCTAATTAAGTCCATTAAAAAGAAACGATCTAACCTTTATAGATAGAAGAACAGGGCTATCAGAAAAGCTCGATTCTTCATCACTTTTTCTCAGTAGCAAGCTTCTATCTATGATGAGTTCAGGGCTTTTCAAATAAGTTTGCCGATGAATCTCACCAACTACACATCTGCTAGCTACACTTTGATAGTAAAAGATTATAAAACAAAAGGATACAACAGTCTAGAAGAAGATAGGCGAAGACCAACTTCCACAACAGATGCTGCACACACACACACAAAAAAAAAAAAAGAACCCAACAATTCTTATTGGATCAGAGACTACTCAATATCCCCAAACTTGGAAATTAGTTTGTTGCTTGAGGTCTAAGATACTTCTATATATGGAAAAAGATTTTCAAAGCCAGATATTTCCACAAGTTTGTAATATCAATTCAAGATAAGAGAGCTAGAATCAGACAGGAACTAGCAATGCTTGAAATCAAGAACTTGAATTGAAATAGTTTTTTACCTGAATATTGACAGTTGCTGGATTAATTGCATTGTAGAGGACGTGTCTATATACCTTTGGTCTGTGAAGGATTAAATCGATGAAAATAATCTGCCAAAGAAAACAATTAAAGAACCAAAAACCAAAATTGGAAAGAAATAGGGAAACACCCAAAAAGGGAAAGAAAGTGATTAAAACAGACCATGCGTTCACACTCGATGTACTCATCTGCTACTTCCTTGCAATTTCCCTAAATATAACAATATGATCAAAGATGGAAACTTTGAAGAAATTTAATAGAGAATCTTATAAACCCTAATTGGGTCAAAGAAGATCCATTAATACAAAAATCTTACGCATTTCATGAGACGAATGTTACCCGGAGAGTATTGAATGAACAATGACTTTACCCTAAAACCACATCCCACGCATCTGTGTTCACTCGCCGCCATTGCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCAAGAGAAGAAGAATACGGAGCAATTAGAGTCCGGGTCT
>mRNA0010678708737
TTAGTAAGGTCTAATTCAATTTTTGGTGGCGATAATATTTGGCTTAGTCATAAAATACAGTATGGTATAATAATGTAAAGGTTTCTCTTATCTTCAAACCAAAAGACTATACTGGAAGCTGATGGGATCATACGATTCTGAAAAAATAAGACATATATTGCAACAGAGATCCAATTTGTATCAAAAATATTGTCGGCTCAAAAATCTGACCCACCAAGAATCTAATCAAGTGCGCGATTAAGCATACGGCTATGCATCTGGTCATTGTTGATTCAGTCATCACTGGTTTAAAGACAAACTTGCATTGTGAGATTCCAAAATAACAACAACAAAAAACAATTTGCATTGAGAACATTTTGAAGGTCTGACCTTTAAGAGCCATGGAGTTTGATGTTAAGAGAAGTATATCGACAAAAAAAATCACTGACATTGGGAATTCCCATACCTGTAATAACAAAGATTCTCTATTTTTGAGCAAAGAGACATAACACCATGTTTAATCAATACAAGAAACTTTAGTGCATGATTTATGGAATGCTTAAGAAGTTTGGAACTTCAATATTAGGAATTAAGTGAGAGGTAAAGCTACAACATACCAACATCGCAAGCAGAAATATCTTGAAGTAACTAGAGATGAATATCCCCAACATAATCTCTCTTCTTCTGCAAAGTTTTTAAAAAAAATTATACATAAACAATCTTCAGGTGACAGAAGTCTGAGATCTTTGATGAAAAACTCATAATAAGAACTAAGAAGAAGAAGAATCAGTAATCACCTGGAAACTTCATTTAGCAAACCCTTAGTCGCAATGGCAAAAGAGATGATAAATGCAGCGTTTGCAGATAAGACACCAATCAGAACCTGTTTCAAATGCGAAATTATTACCCTTTCTAAACAATCTCAATGACTTAAATCATTTAAACCTTAAAGGAAAAAAAATCTAATTAAGTCCATTAAAAAGAAACGATCTAACCTTTATAGATAGAAGAACAGGGCTATCAGAAAAGCTCGATTCTTCATCACTTTTTCTCAGTAGCAAGCTTCTATCTATGATGAGTTCAGGGCTTTTCAAATAAGTTTGCCGATGAATCTCACCAACTACACATCTGCTAGCTACACTTTGATAGTAAAAGATTATAAAACAAAAGGATACAACAGTCTAGAAGAAGATAGGCGAAGACCAACTTCCACAACAGATGCTGCACACACACACACAAAAAAAAAAAAAGAACCCAACAATTCTTATTGGATCAGAGACTACTCAATATCCCCAAACTTGGAAATTAGTTTGTTGCTTGAGGTCTAAGATACTTCTATATATGGAAAAAGATTTTCAAAGCCAGATATTTCCACAAGTTTGTAATATCAATTCAAGATAAGAGAGCTAGAATCAGACAGGAACTAGCAATGCTTGAAATCAAGAACTTGAATTGAAATAGTTTTTTACCTGAATATTGACAGTTGCTGGATTAATTGCATTGTAGAGGACGTGTCTATATACCTTTGGTCTGTGAAGGATTAAATCGATGAAAATAATCTGCCAAAGAAAACAATTAAAGAACCAAAAACCAAAATTGGAAAGAAATAGGGAAACACCCAAAAAGGGAAAGAAAGTGATTAAAACAGACCATGCGTTCACACTCGATGTACTCATCTGCTACTTCCTTGCAATTTCCCTAAATATAACAATATGATCAAAGATGGAAACTTTGAAGAAATTTAATAGAGAATCTTATAAACCCTAATTGGGTCAAAGAAGATCCATTAATACAAAAATCTTACGCATTTCATGAGACGAATGTTACCCGGAGAGTATTGAATGAACAATGACTTTACCCTAAAACCACATCCCACGCATCTGTGTTCACTCGCCGCCATTGCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCAAGAGAAGAAGAATACGGAGCAATTAGAGTCCGGGTCT