The enrichment analysis is relatively stable now, but annotation analysis can break. Possible reason, see below, the long list of arguments (GFs) that is passed to 'annotationAnalysis'. If a lot of GFs are selected, the script become 'annotationAnalysis foi1.bed gf1.bed.gz gf2.bed.gz...............'.
The list of arguments is judged not by the number, but by the total length. So, it may be dangerous to set a fixed number of args and run annotation analysis in chunks, but is an option. Another option is to run annotation analysis on per GF basis, like 'annotationAnalysis foi1.bed gf1.bed.gz', 'annotationAnalysis foi1.bed gf2.bed.gz'. And so on. The disadvantage of either approach is the need of additional step of columnwise concatenating the annotations from each run, to get one annotation file for the FOI1. It is done, e.g., ' paste annot1.txt <(cut -f2 annot2.txt ) <(cut -f2 annot1.txt ) > foi1.txt'. But for this to work, we need to fix headers - remember double tab in the headers - so each column has GF name above it. Right now, GF names are in the 3rd field, and are missing.
2015-08-05 18:50:00,602 INFO Annotation started
2015-08-05 18:50:00,603 INFO Running annotation analysis for All_autoimmune
The enrichment analysis is relatively stable now, but annotation analysis can break. Possible reason, see below, the long list of arguments (GFs) that is passed to 'annotationAnalysis'. If a lot of GFs are selected, the script become 'annotationAnalysis foi1.bed gf1.bed.gz gf2.bed.gz...............'.
The list of arguments is judged not by the number, but by the total length. So, it may be dangerous to set a fixed number of args and run annotation analysis in chunks, but is an option. Another option is to run annotation analysis on per GF basis, like 'annotationAnalysis foi1.bed gf1.bed.gz', 'annotationAnalysis foi1.bed gf2.bed.gz'. And so on. The disadvantage of either approach is the need of additional step of columnwise concatenating the annotations from each run, to get one annotation file for the FOI1. It is done, e.g., ' paste annot1.txt <(cut -f2 annot2.txt ) <(cut -f2 annot1.txt ) > foi1.txt'. But for this to work, we need to fix headers - remember double tab in the headers - so each column has GF name above it. Right now, GF names are in the 3rd field, and are missing.
2015-08-05 18:50:00,602 INFO Annotation started
2015-08-05 18:50:00,603 INFO Running annotation analysis for All_autoimmune
2015-08-05 18:50:00,757 ERROR Traceback (most recent call last):
File "/home/mdozmorov/genome_runner/grsnp/hypergeom4.py", line 289, in get_annotation
File "/usr/lib/python2.7/subprocess.py", line 710, in init
File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
OSError: [Errno 7] Argument list too long
2015-08-05 18:50:00,757 WARNING Traceback (most recent call last):
2015-08-05 18:50:00,758 WARNING File "/home/mdozmorov/genome_runner/grsnp/hypergeom4.py", line 755, in run_hypergeom
2015-08-05 18:50:00,758 WARNING anot = get_annotation(f,gfs).split("\n")
2015-08-05 18:50:00,758 WARNING File "/home/mdozmorov/genome_runner/grsnp/hypergeom4.py", line 310, in get_annotation
2015-08-05 18:50:00,758 WARNING raise e
2015-08-05 18:50:00,758 WARNING OSError: [Errno 7] Argument list too long
2015-08-05 18:50:00,758 ERROR None
2015-08-05 18:50:00,759 WARNING Traceback (most recent call last):
2015-08-05 18:50:00,759 WARNING File "/home/mdozmorov/genome_runner/grsnp/worker_hypergeom4.py", line 45, in run_hypergeom
2015-08-05 18:50:00,764 WARNING grsnp.hypergeom4.run_hypergeom(fois+"_full", gfs+"_full", bg_path,outdir,job_name,zip_run_files,bkg_overlaps_path,sett['root_data_dir'][db_version],run_annotation,run_randomization_test,padjust,pct_score,organism)
2015-08-05 18:50:00,765 WARNING File "/home/mdozmorov/genome_runner/grsnp/hypergeom4.py", line 773, in run_hypergeom
2015-08-05 18:50:00,765 WARNING raise Exception(e)
2015-08-05 18:50:00,766 WARNING Exception: [Errno 7] Argument list too long
2015-08-05 18:50:00,766 WARNING None