Closed EricR86 closed 3 years ago
posterior-run gives an an "Argument list too long" error
GM12878_4/post/cmdline/identify/jt38.post.2f983bd2f85911eaa3717cd30ac741f6.sh: line 2: /project/6033554/arab/domain-annotations/annotations-hic/hic/egpr_test/run_egpr/chr10k/segway_posterior_fix/bin/segway-task: Argument list too long
Here is a summary of the command being run by Segway
/project/6033554/arab/domain-annotations/annotations-hic/hic/egpr_test/run_egpr/chr10k/venv_segway2/bin/segway-task run posterior GM12878_3/post/posterior/posterior%s.18.bed chr11_res1000 1743 2343 1 0 2 1 seg ../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata asinh_norm 0,1,2,3,4,5,6,7,8,9,10,11,12 True '[(1743, 1744), (1743, 1744), (1744, 1745), (1744, 1745), (1745, 1746), (1745, 1746), (1746, 1747), (1746, 1747), (1747, 1748), (1747, 1748), (1748, 1749), (1748, 1749), (1749, 1750), (1749, 1750), (1750, 1751), (1750, 1751), (1751, 1752), (1751, 1752), (1752, 1753), (1752, 1753), (1753, 1754), (1753, 1754), (1754, 1755), (1754, 1755),
...
'[{0: 0.24712634086608887}, {1: 0.7528736591339111}, {0: 0.26267687269082285}, {1: 0.7373231273091772}, {0: 0.18879308733960726}, {1: 0.8112069126603927},
...
1 -base 3 -cCliquePrintRange 1:1 -cliqueTableNormalize 0.0 -componentCache F -cppCommandOptions '-DCARD_SEG=2 -DCARD_SUPERVISIONLABEL=-1 -DINPUT_PARAMS_FILENAME=GM12878_3/train/params/params.params -DVIRTUAL_EVIDENCE=1 -DVIRTUAL_EVIDENCE_LIST_FILENAME=VE_PLACEHOLDER -DCARD_FRAMEINDEX=2000000 -DCARD_SUBSEG=1 -DSEGTRANSITION_WEIGHT_SCALE=1.0' -deterministicChildrenStore F -doDistributeEvidence T -eCliquePrintRange 1:1 -fmt1 binary -fmt2 binary -hashLoadFactor 0.98 -inputMasterFile GM12878_3/train/params/input.master -island T -iswp1 F -iswp2 F -jtFile GM12878_3/post/log/jt_info.posterior.txt -lst 100000 -nf1 13 -nf2 0 -ni1 0 -ni2 14 -obsNAN T -of1 GM12878_3/post/observations/float32.list -of2 GM12878_3/post/observations/int.list -pCliquePrintRange 1:1 -strFile GM12878_3/train/segway.str -triFile GM12878_3/post/triangulation/segway.str.2.1.posterior.trifile -verbosity 0
I've been able to confirm that the virtual evidence option works for the posterior task. My tests can be found .
@mariamarab so previously an entire chromosome's worth of virtual evidence was passed to each job (segway-task). I've now added in a fix where only user supplied virtual evidence coordinates that overlap with the region to be trained on (or annotated or run through posterior) are given to the spawned job. This should significantly alleviate the ARG_MAX
/"Argument list too long" issue.
I would really appreciate it if you would let me know if that helps.
Thanks!
@mariamarab a bug was fixed with higher resolutions and virtual evidence. It should resolve any issues you have with mismatched observation file size differences.
Let me know if this helps!
@mariamarab have you had any issues thus far? I'd like to move to a review/merge on this PR as soon as possible.
Yes, I have tested it and it seems to be working now.
@michaelmhoffman this is ready for your review
It seems that virtual evidence was accidentally not put as an option since the parser was missing for posterior. It's been re-enabled but has been proven difficult to test. Any feedback would be appreciated.