lfaino / LoReAn

Long Reads Annotation pipeline
MIT License
70 stars 10 forks source link

Dictionary error. #9

Closed iwangtoknow closed 6 years ago

iwangtoknow commented 6 years ago

Hi Luigi, I run the docker and meet this error:

Traceback (most recent call last):
  File "/opt/LoReAn/code/lorean.py", line 538, in <module>
    main()
  File "/opt/LoReAn/code/lorean.py", line 420, in main
    gmap_wd, args.adapter, threads_use, a=True)
  File "/opt/LoReAn/code/manipulateSeq.py", line 271, in filterLongReads
    SeqIO.write(final_seq, out_filename, "fasta")
  File "/usr/local/lib/python3.5/dist-packages/Bio/SeqIO/__init__.py", line 477, in write
    with as_handle(handle, mode) as fp:
  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.5/dist-packages/Bio/File.py", line 88, in as_handle
    with open(handleish, mode, **kwargs) as fp:
FileNotFoundError: [Errno 2] No such file or directory: '/data/LoReAn_20180716/run//gmap_output/Data/all_long_reads.fq.longreads.filtered.fasta' 

I checked the code and last normal run dictionary, and I realized that I provided the file PacBio subreads.fastq with route /Data/all_long_reads.fq causing this problem.


Another question should I provide polished long reads fastq file? I use PacBio Isoseq. Thx.

lfaino commented 6 years ago

Hi, when did you download the image? Which image are you using? I made some changes and maybe these can make problem.

Cheers Luigi

iwangtoknow commented 6 years ago

Hi, Luigi lfaino/lorean <none> 153ab8ea611e 6 weeks ago 32.9GB I can only provide you this, I just pull the last iprscan_rpMask, so the former image tag change to <none>


Can LoReAn auto-remove the folder route now? Thx WANG

lfaino commented 6 years ago

hi, can you remove the image and run it again? i made a new image and you can check if the problem solvers.

Cheers luigi

iwangtoknow commented 6 years ago

Hi Luigi, A new error occurred.

PASA configuration file existed already: /data/LoReAn_20180716/run/PASA/alignAssembly.config --- skipping
    ###LOADING GFF3 FILE INTO DATABASE###
    ###UPDATING GFF3 FILE###
    ###PARSING OUTPUT###
Traceback (most recent call last):
  File "/opt/LoReAn/code/lorean.py", line 550, in <module>
    main()
  File "/opt/LoReAn/code/lorean.py", line 397, in main
    final_keep = grs.genename_last(final_output, args.prefix_gene, args.verbose, pasa_dir, dict_ref_name, "pasa")
  File "/opt/LoReAn/code/getRightStrand.py", line 173, in genename_last
    db1 = gffutils.create_db(out.name, ':memory:', merge_strategy='create_unique', keep_order=True, transform=transform_name)
  File "/usr/local/lib/python3.5/dist-packages/gffutils/create.py", line 1288, in create_db
    c.create()
  File "/usr/local/lib/python3.5/dist-packages/gffutils/create.py", line 504, in create
    self._populate_from_lines(self.iterator)
  File "/usr/local/lib/python3.5/dist-packages/gffutils/create.py", line 569, in _populate_from_lines
    for i, f in enumerate(lines):
  File "/usr/local/lib/python3.5/dist-packages/gffutils/iterators.py", line 103, in __iter__
    i = self.transform(i)
  File "/opt/LoReAn/code/getRightStrand.py", line 231, in transform_name
    fatureattgene = prefix_name + '_' + chrold + '_G_' + str(gene_count)
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Do you think Which one could be None ?

Thx WANG

lfaino commented 6 years ago

Hi Wang, I think that i find the error. in 3 hours a new images should be ready to download and you can test it. let me know

Cheers Luigi

iwangtoknow commented 6 years ago

Hi Luigi, I add a -n parameter and the None value disappear anyway. I finished a run but with I move assembly and RNA data to working dictionary. So, maybe the problem fixed, I'm not sure. LoReAn worked in my Fungi fine, Thanks for your excellent work. Maybe next you can add more functional annotate step such as eggnog etc. That could be remote. And some comparative analysis.

WANG

iwangtoknow commented 6 years ago

I get a few bad genes, I'm trying manually to fix them. I'll try the new image and reply to you.

iwangtoknow commented 6 years ago

2854 No UTR and 5' should be extend. 10623 No UTR and 5' should be extend. 5713 Bad Gene 6339 Not sure 6059 Not sure 4619 Not sure 8905 Seems no problem also in bad list?? 9837 Maybe mRNA9837 is pseudo mRNA

lfaino commented 6 years ago

HI Wang, can you send me the command that you use to run lorean? can you tell me which chemistry are you using to sequence cDNA?

Cheers Luigi

iwangtoknow commented 6 years ago
lorean.py \
-pr /data/ProteinEvidence/Asp_PDB_Uniprot6090.fasta \
-sr left.fq.right.fq \
-lr all_long_reads.fq \
-sp aspergillus_nidulans \
-mg -iprs -f -k -t 10 -q 3000 -n An \
BJCHM5.fa

left.fq and right.fq are my short reads transcripts data sequenced by Illumina Nova Seq PE-150, I have 3 sample and I cat them together to left.fq and right.fq;

all_long_reaads.fq is my long reads transcripts data sequenced by PacBio RSII I combine 2 cell data, the first cell only have a little num reads due to sequence sample, and the second cell produced more. All long reads are *subreads.fq, now I use IsoSeq3 polished my reads.

My genome also sequenced by RSII 2 cell data. And my short reads transcript data >60 million reads have a >95% unique mapping freq against my genome assembly.

The long sub_reads have many gap and SNP that is normal. THANKS Luigi

ps. I'm still download the latest iprscan_rpMask image, I failed 5 times due to web connection error. WANG

lfaino commented 6 years ago

Hi Wang, i would suggest to use the option "-d" with your experiment using the image that you are downloading. It will automatically try to find sequencing adapter and strand the reads.

For your experiment, i would suggest to use CCS reads. I had very good results with CCS data.

Let me know Luigi

iwangtoknow commented 6 years ago

Thanks Luigi, I use link to IsoSeq3! the pipeline Image of pipeline

Should I use polished reads or your recommendation of CCS data?

iwangtoknow commented 6 years ago

Luigi, Sorry I don't understand the chemistry last reply. Both of my transcriptome sequencing experiments are general eukaryotic library, NOT the strand specific eukaryotic transcriptome library. Can I also add the -d option?

lfaino commented 6 years ago

Hi Wang, yes you can use it. I thought that your sequencing was done by Nanopore and not PacBio for this I asked about chemistry. Can you see poly AAA in your sequencing output? The command would be:

lorean.py \ -d -pr /data/ProteinEvidence/Asp_PDB_Uniprot6090.fasta \ -sr left.fq.right.fq \ -lr all_long_reads.fq \ -sp aspergillus_nidulans \ -mg -iprs -f -k -t 10 -q 3000 -n An \ BJCHM5.fa

lfaino commented 6 years ago

Hi Wang, few words on BAD GENES list. This are genes that do not start with M. I called them BAD GENES but they can be good genes.

Cheers Luigi

iwangtoknow commented 6 years ago

Thanks Luigi, Yes I can see primer+TTTTTTTTTTTT almost every sequence. The sequencing library is ployA enriched. You helped me a lot indeed. WANG

iwangtoknow commented 6 years ago

Hi Luigi, The dictionary error still there.

###RUNNING iASSEMBLER   18:11:40 25-07  ###
### STARTING ADAPTER ALIGNMENT AND READS ORIENTATION ###
100%|#################################| 224792/224792 [01:02<00:00, 3572.21it/s]
Traceback (most recent call last):
  File "/opt/LoReAn/code/lorean.py", line 550, in <module>
    main()
  File "/opt/LoReAn/code/lorean.py", line 431, in main
    args.max_intron_length, args.verbose, args.stranded)
  File "/opt/LoReAn/code/manipulateSeq.py", line 171, in filterLongReads
    filter_count = align.adapter_alignment(out_filename, adapter_aaa, scoring, align_score_value, out_filename_oriented, threads, min_length)
  File "/opt/LoReAn/code/align.py", line 94, in adapter_alignment
    with open(out_filename, "w") as output_handle:
FileNotFoundError: [Errno 2] No such file or directory: '/data/LoReAn_20180723/run//gmap_output/Data/long_reads/all_longreads.fq.longreads.filtered.oriented.fasta'

WANG

lfaino commented 6 years ago

Hi Wang, can you check is the file /data/LoReAn_20180723/run//gmap_output/Data/long_reads/all_longreads.fq.longreads.filtered.oriented.fasta exists and is not empty?

Cheers Luigi

iwangtoknow commented 6 years ago

Hi Luigi, I checked the file, the file all_longreads.fq.longreads.filtered.oriented.fasta is NOT exists, and the dictionary Data is also NOT exists. I provide the long_reads data with -lr Data/long_reads/all_longreads.fq, and I think you could cut the dictionary, output data to /data/LoReAn_20180723/run//gmap_output/all_longreads.fq.longreads.filtered.oriented.fasta.


After the Error occurred, I made a symbolic link from Data/long_reads/all_longreads.fq to ./all_longreads. lorean.py -lr all_longreads.fq is alright. WANG

iwangtoknow commented 6 years ago

Hi Luigi, I finished a second run. As you suggestion I add a -d and provide CCS long reads data. A lot addition genes.

LOREAN GFF3 STATS -d and ccs last run ------------ | ------------- | ------------- parsed genome node DAGs | 10169 | 10378 sequence regions | 21 (total length: 31038331) | 22 (total length: 31036420) genes | 10147 | 10355 protein-coding genes | 10146 | 10355 mRNAs | 11453 | 10695 protein-coding mRNAs | 11452 | 10694 exons | 38142 | 33609 CDSs | 36273 | 32932 introns | 80067 | 68742

WANG

lfaino commented 6 years ago

Are you satisfied with the results? Did you check the alignment of the long reads on the genome?

Cheers Luigi