mdozmorov / genome_runner

Academic Free License v3.0
0 stars 3 forks source link

Annotation analysis breaks #102

Closed mdozmorov closed 9 years ago

mdozmorov commented 9 years ago

The enrichment analysis is relatively stable now, but annotation analysis can break. Possible reason, see below, the long list of arguments (GFs) that is passed to 'annotationAnalysis'. If a lot of GFs are selected, the script become 'annotationAnalysis foi1.bed gf1.bed.gz gf2.bed.gz...............'.

The list of arguments is judged not by the number, but by the total length. So, it may be dangerous to set a fixed number of args and run annotation analysis in chunks, but is an option. Another option is to run annotation analysis on per GF basis, like 'annotationAnalysis foi1.bed gf1.bed.gz', 'annotationAnalysis foi1.bed gf2.bed.gz'. And so on. The disadvantage of either approach is the need of additional step of columnwise concatenating the annotations from each run, to get one annotation file for the FOI1. It is done, e.g., ' paste annot1.txt <(cut -f2 annot2.txt ) <(cut -f2 annot1.txt ) > foi1.txt'. But for this to work, we need to fix headers - remember double tab in the headers - so each column has GF name above it. Right now, GF names are in the 3rd field, and are missing.

2015-08-05 18:50:00,602 INFO Annotation started

2015-08-05 18:50:00,603 INFO Running annotation analysis for All_autoimmune

2015-08-05 18:50:00,757 ERROR Traceback (most recent call last):

File "/home/mdozmorov/genome_runner/grsnp/hypergeom4.py", line 289, in get_annotation

out = subprocess.Popen(["annotationAnalysis"] + [foi] + gfs,stdout=tmp_file,stderr=tmp_error_file) # TODO enable ["--print-region-name"]

File "/usr/lib/python2.7/subprocess.py", line 710, in init

errread, errwrite)

File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child

raise child_exception

OSError: [Errno 7] Argument list too long

2015-08-05 18:50:00,757 WARNING Traceback (most recent call last):

2015-08-05 18:50:00,758 WARNING File "/home/mdozmorov/genome_runner/grsnp/hypergeom4.py", line 755, in run_hypergeom

2015-08-05 18:50:00,758 WARNING anot = get_annotation(f,gfs).split("\n")

2015-08-05 18:50:00,758 WARNING File "/home/mdozmorov/genome_runner/grsnp/hypergeom4.py", line 310, in get_annotation

2015-08-05 18:50:00,758 WARNING raise e

2015-08-05 18:50:00,758 WARNING OSError: [Errno 7] Argument list too long

2015-08-05 18:50:00,758 ERROR None

2015-08-05 18:50:00,759 WARNING Traceback (most recent call last):

2015-08-05 18:50:00,759 WARNING File "/home/mdozmorov/genome_runner/grsnp/worker_hypergeom4.py", line 45, in run_hypergeom

2015-08-05 18:50:00,764 WARNING grsnp.hypergeom4.run_hypergeom(fois+"_full", gfs+"_full", bg_path,outdir,job_name,zip_run_files,bkg_overlaps_path,sett['root_data_dir'][db_version],run_annotation,run_randomization_test,padjust,pct_score,organism)

2015-08-05 18:50:00,765 WARNING File "/home/mdozmorov/genome_runner/grsnp/hypergeom4.py", line 773, in run_hypergeom

2015-08-05 18:50:00,765 WARNING raise Exception(e)

2015-08-05 18:50:00,766 WARNING Exception: [Errno 7] Argument list too long

2015-08-05 18:50:00,766 WARNING None