Open liupfskygre opened 2 years ago
I rerun the command again and with the following errors similar to above ones,
Reference base at each position will be the consensus of all files.
Getting codon usage bias...
Finalizing SNPs...
Updating genes with consensus bases...
Updating genomes with consensus bases...
MetaPop SNP refinement finished at: 05/02/2022 11:40:48
Linking SNPs starting at: 05/02/2022 11:40:48...multiprocessing.pool.RemoteTraceback:
Traceback (most recent call last):
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 143, in read_one_range
leftmost = int(segs[3].decode())
ValueError: invalid literal for int() with base 10: 'TGLS2_1908_Scaff092085'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/PTPE2/Software/miniconda3/envs/metapop/bin/metapop", line 8, in <module>
sys.exit(main())
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_main.py", line 300, in main
linked_file = metapop.metapop_mine_reads.do_mine_reads(output_directory_base, threads)
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 450, in do_mine_reads
res = access_read_ranges(selections_to_read, threads, output_directory)
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 202, in access_read_ranges
res = pool.map(read_one_range, ranges)
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: invalid literal for int() with base 10: 'TGLS2_1908_Scaff092085'
Hi, Ann, I checked things in more detail. I checked the metapop_mine_reads.py, and see segs[3] is defined as ref_base = segs[3],
fh = open(file)
for line in fh:
if line.endswith("True\n"):
segs = line.strip().split("\t")
#contig_pos = segs[0]
contig = segs[1]
pos = int(segs[2])
ref_base = segs[3]
source = segs[9]
snps = segs[10]
contig_gene = segs[11]
#if OC == 1, strand = forward, else strand = reverse
OC = int(segs[14])
codon = int(segs[15])
pos_in_codon = int(segs[16])
linked_data[source][contig][contig_gene][codon][OC].append([pos, ref_base, snps, pos_in_codon])
fh.close()
I guess the file is refer to the genic_snps.tsv file in the MetaPop/07.Cleaned_SNPs dir with the header, right?
contig_pos contig pos ref_base depth a_ct t_ct c_ct g_ct source snps contig_gene start end OC codon pos_in_codon link
if so,
then ref_base =segs[3] should be one base 'A', 'T', 'C', 'G', right?
in my case, it becomes something else.
and even with ATCG, int(segs[3]) will raise an error, int('T')
so, what is the file here refer to, and how could this been fixed?
thanks, Pengfei
Hi Pengfei - let me pass these errors on to Kenji. He's the mastermind behind the new code. We'll get back to you soon!
That line caused the same error for another user. The problem was that the mapping tool he had used, BBmap, took more information from the deflines of his reads than the sequence ID, and the additional information contained whitespaces.
The split to create segs in the mine_reads script is done by issuing a call to samtools, reading the output into python, and splitting the line on whitespace. If there are more whitespaces than expected, then the position of the read in the reference genome is shifted past the 4th position in the split line.
We have a new version of the code up already that fixes this problem. The split happens on tabs (samtools output is tab-separated)) instead of separating on any whitespace.
Hi, Ann, I got an error when running metapop installed from pip with the following command: metapop --input_samples ./bamfile --reference ./reference --norm tp-notp-166-metapop_ctfile.txt --threads 60
the installation should be fine since i run the toy dataset and it successfully done.
Following is error info, do you have any suggestions on how to fix it.
Thanks. Pengfei
error info
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 450, in do_mine_reads res = access_read_ranges(selections_to_read, threads, output_directory) File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 202, in access_read_ranges res = pool.map(read_one_range, ranges) File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value ValueError: invalid literal for int() with base 10: 'KQGRI2_20_08_k141_904564'