vgteam / toil-vg

Distributed and cloud computing framework for vg
Apache License 2.0
21 stars 14 forks source link

Encoding issues with coordinates in id_ranges? #824

Open mlinderm opened 3 years ago

mlinderm commented 3 years ago

I am running into the following error with 1.6.2a1.dev415 and Python 3.6:

node003 2020-08-27 13:00:59,701 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo    Traceback (most recent call last):
node003 2020-08-27 13:00:59,701 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo      File "/modules/toil-vg/1.6.2a1.dev415/lib64/python3.6/site-packages/toil/worker.py", line 366, in workerScript
node003 2020-08-27 13:00:59,701 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo        job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore, defer=defer)
node003 2020-08-27 13:00:59,701 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo      File "/modules/toil-vg/1.6.2a1.dev415/lib64/python3.6/site-packages/toil/job.py", line 1392, in _runner
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo        returnValues = self._run(jobGraph, fileStore)
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo      File "/modules/toil-vg/1.6.2a1.dev415/lib64/python3.6/site-packages/toil/job.py", line 1329, in _run
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo        return self.run(fileStore)
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo      File "/modules/toil-vg/1.6.2a1.dev415/lib64/python3.6/site-packages/toil/job.py", line 1533, in run
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo        rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo      File "/modules/toil-vg/1.6.2a1.dev415/lib64/python3.6/site-packages/toil_vg/vg_map.py", line 740, in run_merge_gams
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo        id_ranges = parse_id_ranges(job, id_ranges_file_id)
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo      File "/modules/toil-vg/1.6.2a1.dev415/lib64/python3.6/site-packages/toil_vg/vg_common.py", line 760, in parse_id_ranges
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo        return parse_id_ranges_file(id_range_file)
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo      File "/modules/toil-vg/1.6.2a1.dev415/lib64/python3.6/site-packages/toil_vg/vg_common.py", line 771, in parse_id_ranges_file
node003 2020-08-27 13:00:59,702 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo        id_ranges.append((toks[0], int(toks[1]), int(toks[2])))
node003 2020-08-27 13:00:59,703 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo    ValueError: invalid literal for int() with base 10: "b'1'"
node003 2020-08-27 13:00:59,703 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo    ERROR:toil.worker:Exiting the worker because of a failed job on host node003
node003 2020-08-27 13:00:59,703 MainThread WARNING toil.leader: kind-run_merge_gams/instanceg9052nwo    WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'run_merge_gams' kind-run_merge_gams/instanceg9052nwo with ID kind-run_merge_gams/instanceg9052nwo to 0
node003 2020-08-27 13:00:59,703 MainThread WARNING toil.leader: Job 'run_merge_gams' kind-run_merge_gams/instanceg9052nwo with ID kind-run_merge_gams/instanceg9052nwo is completely failed
node003 2020-08-27 13:01:09,764 MainThread INFO toil.leader: Finished toil run with 4 failed jobs.
node003 2020-08-27 13:01:09,764 MainThread INFO toil.leader: Failed jobs at end of the run: 'run_whole_alignment' kind-run_whole_alignment/instancey4ychoz6 'run_merge_gams' kind-run_merge_gams/instanceg9052nwo 'Job' kind-Job/instanceqrcmeifw 'run_split_reads' kind-run_write_info_to_outstore/instancepsa71z_h

I encountered the same error previously and tracked it to the bytes indicator b"..." getting written into the id_ranges file, e.g.

1       b'1'    b'7790661'
2       b'7790662'      b'15393655'
...

I stripped out the b' characters from the id_ranges files but am still running into errors in run_merge_gams. Are there other files I should/can fix up? It looks the problematic file is a temporary file.

adamnovak commented 3 years ago

Yeah, I think we want a .decode('utf-8') here and no random decodes in the rest of the function, so we pass around strings instead of bytes.

This whole loop should work in text and not bytes, as well.