macarthur-lab / clinvar

This repo provides tools to convert ClinVar data into a tab-delimited flat file, and also provides that resulting tab-delimited flat file.
Other
122 stars 55 forks source link

Job parse failure on system execution. ENV issues? #42

Closed raymond301 closed 6 years ago

raymond301 commented 7 years ago

master.py LINE 149

The custom sorting via eGrep & Sort commands. # sort job.add(("cat " + "<(gunzip -c IN:%(tmp_dir)s/clinvar_table_normalized.%(fsuffix)s.tsv.gz

When executing on the system fails on a syntax error, which is fairly uninformative.

[Sep 12 13:29:27]: --> Exec 1.4: cat <(gunzip -c output_tmp/clinvar_table_normalized.multi.b37.tsv.gz | head -1) <(gunzip -c output_tmp/clinvar_table_normalized.multi.b37.tsv.gz | tail -n +2 | egrep -v "^[XYM]" | sort -k1,1n -k2,2n -k3,3 -k4,4 ) <(gunzip -c output_tmp/clinvar_table_normalized.multi.b37.tsv.gz | tail -n +2 | egrep "^[XYM]" | sort -k1,1 -k2,2n -k3,3 -k4,4 ) | bgzip -c > output_tmp/tmp.2017-09-12_13.29.26.clinvar_allele_trait_pairs.multi.b37.tsv.gz

[Sep 12 13:29:27]: Output (last mod N/A): output_tmp/clinvar_allele_trait_pairs.multi.b37.tsv.gz [doesn't exist yet]

[Sep 12 13:29:28]: /bin/sh: -c: line 0: syntax error near unexpected token('`

[Sep 12 13:29:28]: /bin/sh: -c: line 0: cat <(gunzip -c output_tmp/clinvar_table_normalized.multi.b37.tsv.gz | head -1) <(gunzip -c output_tmp/clinvar_table_normalized.multi.b37.tsv.gz | tail -n +2 |

I'm trying to work through this right now, but if you have any suggestions....

raymond301 commented 7 years ago

I found this on stackoverflow:

You get the error because process substitution (the <(some command) part) is not a standard feature (defined in POSIX) in sh, which means it may work on some OS but may not in others or in the same OS with different configuration.

Looks like I will need to come up with a different solution since this issue is OS specific.

raymond301 commented 7 years ago

Are you aware of this bug? with Python Subprocesses & gzip: https://blog.nelhage.com/2010/02/a-very-subtle-bug/

I tried to move the unix commands into a shell script and append that to the pypez Job Queue...but it breaks as well. gzip: stdout: Broken pipe

Still looking for a solution to this sort issue.

nick-hahner commented 6 years ago

Force it to use bash? Try changing jr = pypez.JobRunner() to jr = pypez.JobRunner(shell='/bin/bash')

raymond301 commented 6 years ago

Tried that. I don't understand the issue myself...but I talked with the IT guys, apparently, shell and bash call common executable that uses the env path.

Not sure why this happens. But I ended up just calling all the component python scripts individually, and that serves my purpose. Removing pypez dependency was the solution I came up with, because python's child-processes can be so tricky cross-platform.

I know that's not a resolution, but feel free to close this issue out, if you can't reproduce the same issue.

bw2 commented 6 years ago

Thanks @nick-hahner It does sound like a shell version issue. I'm running this in bash (v 4.4.12)