morrislab / phylowgs

Application for inferring subclonal composition and evolution from whole-genome sequencing data.
GNU General Public License v3.0
108 stars 55 forks source link

problem with parse_cnvs.py script #133

Open shaghayeghsoudi opened 3 years ago

shaghayeghsoudi commented 3 years ago

Hello, I recently started working on PhyloWGS and after installing, compiling and testing scripts I ran into a problem with parse_cnvs.py script. I have Battenberg outputs and obviously it should be quite straightforward to run it but I keep getting this error message

python ./parse_cnvs.py -f battenberg -c 0.27 data.test.battenberg.txt

error: File "./parse_cnvs.py", line 195, in main() File "./parse_cnvs.py", line 191, in main regions = parser.parse() File "./parse_cnvs.py", line 111, in parse end = int(fields[3 + self._field_offset]) ValueError: invalid literal for int() with base 10: '0.610923189999321'

I just do not understand what's wrong. end = int(fields[3 + self._field_offset]) is the end position of CNV and I do not know what it takes BAF in column 4 (='0.610923189999321). Any idea? I appreciate the help, I am really stuck on that for several days.

dancooke commented 3 years ago

@shaghayegh-flower I just ran into this error and specifying battenberg-smchet rather than just battenberg resolved it. In you case:

$ python ./parse_cnvs.py -f battenberg-smchet -c 0.27 data.test.battenberg.txt

It looks like some Battenberg output has an ID column while others don't.

Once I got this working I ran into another error however:

Traceback (most recent call last):
  File "/well/gerton/dan/apps/phylowgs/parser/parse_cnvs.py", line 195, in <module>
    main()
  File "/well/gerton/dan/apps/phylowgs/parser/parse_cnvs.py", line 191, in main
    regions = parser.parse()
  File "/well/gerton/dan/apps/phylowgs/parser/parse_cnvs.py", line 117, in parse
    cnv1['major_cn'] = int(fields[8 + self._field_offset])
ValueError: invalid literal for int() with base 10: 'NA'

The parser doesn't nicely handle invalid field values, so I had to remove these rows manually before calling the parser:

$ awk '{if ($8!="NA"&&$9!="NA") print}' data.test.battenberg.txt > data.test.battenberg_noNA.txt
$ python ./parse_cnvs.py -f battenberg-smchet -c 0.27 data.test.battenberg_noNA.txt

Hope that helps!

shaghayeghsoudi commented 3 years ago

Thanks a lot @dancooke. That helped a lot