ncbi / ngs-tools

Other
103 stars 25 forks source link

Error at tax_analysis_parser #17

Closed Alsafadi closed 3 years ago

Alsafadi commented 3 years ago

Hi there,

I have managed to run through most of the workflow and I created a sparse/dense databases and ran my samples through the 2 step analysis.

When I try to run hits-to-report.sh at the end, I get an error from the tax_analysis_parser.py as seen below:

I tried to manipulate the entrys (a.e. remove any colon (:) and so) but I still get error that hits are not sorted. But the sort commands proceeds all of this.

Traceback (most recent call last): File "./bin/tax_analysis_parser.py", line 242, in main() File "./bin/tax_analysis_parser.py", line 237, in main xml = parse(f, conn, args.wgs_mode, args.compact, args.include_tax_id or []) File "./bin/tax_analysis_parser.py", line 196, in parse for hits in iterate_merged_spots(f): File "./bin/tax_analysis_parser.py", line 128, in iterate_merged_spots assert last_spot <= spot, 'input is not sorted' AssertionError: input is not sorted

Please let me know if you have any insight to why this may happen.

Thanks,

Alsafadi commented 3 years ago

Hey there,

I have managed to figure out the issue and fix it. The problem was that my "hits" contained more than one tab per line "\t" which prevented the 'tax_analysis_parser.py' function to correctly interpret the hits file.

I solved it by adding a line in hits-to-report.sh in my case, I had the colon character in multiple locations and an equal sign, so I got rid of these too in addition to empty lines:

sed 's/\://g' $hits | sed 's/\+//g' | sed '/^$/d' | sed '/^ $/d' | sed 's/ //g' > $hits.fixed sort $hits.fixed > $hits.sorted

Of course, I am unsure how clear lines existed in these files, but this takes care of them too.

Best regards, Hani