jradrion / TEFLoN

TEFLoN uses paired-end illumina sequence data to discover and genotype transposable elements present in your samples.
13 stars 7 forks source link

teflon.v0.4.py IndexError: list index out of range #1

Closed cecilelorrain closed 5 years ago

jradrion commented 5 years ago

Hi Cécile,

I noticed that you closed this issue on GitHub, were you able to resolve it? Was it a formatting issue with the hierarchy file? I apologize as I have not revisited this code is quite some time. Please let me know if you are still running into errors and I will do my best to fix whatever the issue may be.

Best regards, Jeff

On Tue, May 7, 2019 at 5:52 AM cecilelorrain notifications@github.com wrote:

Hi, I am currently trying to run TEFLoN on a dataset with similar design as yours but with a fungus. I have my own TE annotation performed using the REPET pipeline ( https://urgi.versailles.inra.fr/Tools/REPET). I followed the test_files format to create the hierarchy.txt (tab-separated txt with uniq ids corresponding to annotated TEs): $ head 2855_CJ.hier id family order ms2463_chr_1_RIX-comp_Zt09-B-R1-Map20 RTE RIX ms2467_chr_1_RIX-comp_Zt09-B-R2-Map20 I RIX ms2485_chr_1_RXX-TRIM_Zt09-B-R4-Map3 TRIM RXX-TRIM ms2464_chr_1_RIX-comp_Zt09-B-R1-Map20 RTE RIX ms2486_chr_1_RXX-TRIM_Zt09-B-R4-Map3 TRIM RXX-TRIM ms2465_chr_1_RIX-comp_Zt09-B-R1-Map20 RTE RIX ms2478_chr_1_RLX-incomp_Zt09-B-G236-Map12_reversed Gypsy RLX ms2476_chr_1_noCat_Zt09-B-G87-Map20 noCat noCat ms2852_chr_1_RIX-comp_Zt09-B-R1-Map20 RTE RIX ms2459_chr_1_RLX-incomp_Zt09-B-P25.14-Map9 Gypsy RLX

I have an Index error with the teflon.v0.4.py step but cannot figure why: python /home/lorrain/Mutation_accumulation_MH/TEFLoN/teflon.v0.4.py\

-wd ${workdir} -d ${out_dir}${file}.prep_TF/ -s ${workdir}Samples.txt -i ${file} -eb /data/biosoftware/bwa/bwa-0.7.17/bwa -es /data/biosoftware/samtools/samtools-1.4/samtools -l1 family -l2 order -q 30 Traceback (most recent call last): File "/home/lorrain/Mutation_accumulation_MH/TEFLoN/teflon.v0.4.py", line 442, in main() File "/home/lorrain/Mutation_accumulation_MH/TEFLoN/teflon.v0.4.py", line 208, in main with open(hierFILE, 'r') as fIN: IOError: [Errno 2] No such file or directory: '/home/lorrain/Mutation_accumulation_MH/strain1/TEFLON/2855_CJ/2855_CJ.prep_TF/2855_CJ.hier' -bash-4.2$ python /home/lorrain/Mutation_accumulation_MH/TEFLoN/ teflon.v0.4.py -wd ${workdir} -d ${workdir}${file}.prep_TF/ -s ${workdir}Samples.txt -i ${file} -eb /data/biosoftware/bwa/bwa-0.7.17/bwa -es /data/biosoftware/samtools/samtools-1.4/samtools -l1 family -l2 order -q 30 Traceback (most recent call last): File "/home/lorrain/Mutation_accumulation_MH/TEFLoN/teflon.v0.4.py", line 442, in main() File "/home/lorrain/Mutation_accumulation_MH/TEFLoN/teflon.v0.4.py", line 218, in main if line.split()[1] == args.ID: IndexError: list index out of range -bash-4.2$ python /home/lorrain/Mutation_accumulation_MH/TEFLoN/ teflon.v0.4.py -d ${workdir}${file}.prep_TF/ -s ${workdir}Samples.txt -i ${file} -eb /data/biosoftware/bwa/bwa-0.7.17/bwa -es /data/biosoftware/samtools/samtools-1.4/samtools -l1 family -l2 order -q 30 Traceback (most recent call last): File "/home/lorrain/Mutation_accumulation_MH/TEFLoN/teflon.v0.4.py", line 442, in main() File "/home/lorrain/Mutation_accumulation_MH/TEFLoN/teflon.v0.4.py", line 218, in main if line.split()[1] == args.ID: IndexError: list index out of range

Could you help me with this ? Did I made a mistake in the hierarchy file ? Thank you in advance,

Cécile Lorrain

PS: test_files ran perfectly

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jradrion/TEFLoN/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AEY734EADYICIQWOMDSSV3LPUF3RPANCNFSM4HLIQVCA .

cecilelorrain commented 5 years ago

Dear Jeff,

Thank you for your answer ! I found the right way to format the hierarchy file that's why I closed the post. TEFLoN is a very clever pipeline to assess TE insertions.

May I ask few specific questions ? 1) The sequencing I am working on was made using 250nt reads making part of the read overlapping and the insert size standard deviation around 140nt. Do you think that could cause problem to have overlapping read pairs ? 2) I was wondering how to set properly the -n1/-n2 parameters for teflon_collapse.py, do you have any suggestion ? (if I understood correctly in your paper n1 corresponds to ">3 reads in the focal subline" and n2 corresponds to ">5 reads in non-focal subline")

Thank you again, Best, Cécile

jradrion commented 5 years ago

Hi Cécile,

1) Overlapping read pairs are definitely not ideal, as teflon requires that read pairs have one end map to the TE database and the other end map to the reference and cannot discover TEs by split read mapping, but it could still work provided enough reads in the distribution have ends separated far enough apart that their primary mappings do not overlap. I've never actually tested such a scenario, but I think it should reduce the chance that you discover the location of new TEs in your samples. However, once the insertion positions have been discovered, read overlap should have less of an effect on estimating their allele frequencies. You might want to see if their are any newer TE calling programs that use split-read mappers, as that it probably the best way to address the TE calling problem now anyhow.

2) Given the fact that you will likely have fewer reads that are able to locate new TE positions, I would initially set both of those parameters to

  1. This should give you the most power to identify TEs, but will also elevate the rate of false positives.

Jeff

On Wed, May 8, 2019 at 2:34 AM cecilelorrain notifications@github.com wrote:

Dear Jeff,

Thank you for your answer ! I found the right way to format the hierarchy file that's why I closed the post. TEFLoN is a very clever pipeline to assess TE insertions.

May I ask few specific questions ?

  1. The sequencing I am working on was made using 250nt reads making part of the read overlapping and the insert size standard deviation around 140nt. Do you think that could cause problem to have overlapping read pairs ?
  2. I was wondering how to set properly the -n1/-n2 parameters for teflon_collapse.py, do you have any suggestion ? (if I understood correctly in your paper n1 corresponds to ">3 reads in the focal subline" and n2 corresponds to ">5 reads in non-focal subline")

Thank you again, Best, Cécile

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jradrion/TEFLoN/issues/1#issuecomment-490418461, or mute the thread https://github.com/notifications/unsubscribe-auth/AEY734GWM2L4QRCAYPY454TPUKNANANCNFSM4HLIQVCA .