Closed ChrisFearn97 closed 2 years ago
I will take a look at this. My suspicion is that there is a problem in genSeqErrorModel.
Hello, I have been facing the same issue of mutableSeq. I have used the following command
/usr/local/biotools/python/3.8.1/bin/python ./NEAT/gen_reads.py -r hs37d5.fa -tr target.bed -R 151 -c 45 --force-coverage -E 0.002 -M 0 -v ins.vcf --pe 255 84 -p 2 --bam --vcf -o fastq_files/neat-25-125-ins-NGS118
This command runs on the old version of the NEAT
This issue seems slightly different, because you aren't using the sequencing error model. I'll let you know if this should be in another ticket so we can track it separately once I have a chance to dive into this error.
So far, the only way I've been able to reproduce this error is using Biopython 1.78. Your environment says you have biopython 1.79, but I would double check that this is the case for the specific environment running the script. Biopyothn made a substantial change to how MutableSeq's are handled from 1.78 to 1.79. Add this line of code to the top of main to check the version:
import Bio
print(Bio.__version__)
It is using the version 1.76
Try updating to 1.79 and let me know if that doesn't solve the problem.
I have been receiving an error message when trying to run NEAT to generate artificial sequence data with the gen_reads.py script. Any help would be much appreciated, thanks!
To Reproduce
I have only recently tried to use NEAT and so have set up a conda environment containing the dependencies. I will attach a text file with my conda environment details. conda_list.txt
I have then done a git clone of the repository and run the setup.py script.
I have then used the genSeqErrorModel.py script to generate an error model for the types of reads I would like to simulate using: python genSeqErrorModel.py -i 150_reads1.fq -i2 150_reads2.fq -o errormodel_150 This then generates the file errormodel_150.pickle.gz
I have then used bedtools genomecov and then compute_gc.py to generate a gc coverage bias bedtools genomecov -ibam 150_reads.bam -g ref.fa > 150_reads.bed python compute_gc.py -r ref.fa -i 150_reads.bed -o gc_cov This generates the file gc_cov.pickle.gz
From here I have run the following command:
python gen_reads.py -R 150 --vcf -p 1 -e errormodel_150.pickle.gz --gc-model gc_cov.pickle.gz -r ref.fa -o sim_data150
and receive the following output:
found index ref.fa.fai reading NC_012920.1... 0.008 (sec)
sampling reads... [Traceback (most recent call last): File "gen_reads.py", line 892, in
main()
File "gen_reads.py", line 615, in main
all_inserted_variants = sequences.random_mutations()
File "/nfs/anaconda3/envs/read_sim/lib/python3.8/site-packages/NEAT-3.0-py3.8.egg/source/SequenceContainer.py", line 600, in random_mutations
temp = MutableSeq(self.sequences[i])
File "/nfs/anaconda3/envs/read_sim/lib/python3.8/site-packages/Bio/Seq.py", line 1662, in init
raise TypeError(
TypeError: The sequence data given to a MutableSeq object should be a string or an array (not a Seq object etc)
Expected behavior I would have expected it to output reads of 150bp in length in fastq format with a VCF file that have a similar gc content and error profile to real sequence data I possess.
Desktop (please complete the following information):
Additional context I am trying to get this working on a remote server that is running on Ubuntu 20.04.3