rvolden / Mandalorion-Episode-II

Version II of Mandalorion
MIT License
32 stars 2 forks source link

invalid literal? #8

Closed morellr closed 5 years ago

morellr commented 5 years ago

Hi, When running defineAndQuantifyWrapper.py , the call to createConsensi.py generates this error:

Traceback (most recent call last): File "/usr/local/bin/createConsensi.py", line 221, in corrected_consensus, repeats = determine_consensus(name, fasta, fastq) File "/usr/local/bin/createConsensi.py", line 142, in determine_consensus fastq_reads = read_fastq_file(fastq) File "/usr/local/bin/createConsensi.py", line 101, in read_fastq_file name, seed = name_root[0], int(name_root[1]) ValueError: invalid literal for int() with base 10: 'c'

I tried modifying line 101 in createConsensi.py from: name, seed = name_root[0], int(name_root[1]) to: name, seed = name_root[0], int(float(name_root[1])) and got the same error.

Then I tried: name, seed = name_root[0], int(name_root[1], 16) and this ran to completion and generated output, however the Isoform_Consensi.fasta and Isoform_Consensi_filtered.fasta files have only the headers, no sequence. e.g.

chr45l196242-3r203094~5l196244-3r203095~5l196245-3r203096~+_47474642_47483237_40.0_69.0_9

chr12__+_12911985_12912385_39.5_55.0_4

Now I'm not sure whether to continue trying to figure out the problem in createConsensi.py, or perhaps I have generated illegal fasta names somehow in the C3PO pipeline?

rvolden commented 5 years ago

This is definitely a strange fasta name, since it's trying to get the splint position out of the fasta name, but when it splits the name, it's getting a string (c) instead of a number. Did you adjust your sequence headers?

morellr commented 5 years ago

Hi rvolden,

The read names before and then after C3POa_preprocessing look like this @672de281-f276-48f8-b1b2-b3e0d249c56c (readname in original AlbOut1/0/worspace/pass/*.fastq) @672de281-f276-48f8-b1b2-b3e0d249c56c_3075 (readname in R2C2_raw_reads.fastq) i.e. "_3075" is splint position

When I run C3POa.py the read names in the R2C2_Consensus.fasta appear to have lost the splint position information (?) e.g.

672de281-f276-48f8-b1b2-b3e0d249c56c_11.93_4271_1_2426 The initial stdout messaging on C3POa.py shows this error:

./tmp1/672de281-f276-48f8-b1b2-b3e0d249c56c_consensus_1.fasta ./tmp1/c1623b0d-fc3d-4f57-8ed8-408a70b7bd0e_consensus_1.fasta /usr/lib64/python3.6/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice. warnings.warn("Mean of empty slice.", RuntimeWarning) /usr/lib64/python3.6/site-packages/numpy/core/_methods.py:70: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)

but proceeds to produce output that looked correct to me, and is read by C3POa_postprocessing.py, so I ignored the error message. But now I'm wondering if this is the step where I somehow fail to create the right fasta name.

After C3POa_postprocessing.py, the readname looks like this:

672de281-f276-48f8-b1b2-b3e0d249c56c_11.93_4271_1_2426_2107

rvolden commented 5 years ago

This looks fine to me, since you shouldn't be needing the seed after C3POa.py. The numpy runtime warning is normal and can be ignored