hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

Enable virtual evidence for posterior tasks #146

Closed EricR86 closed 3 years ago

EricR86 commented 4 years ago

It seems that virtual evidence was accidentally not put as an option since the parser was missing for posterior. It's been re-enabled but has been proven difficult to test. Any feedback would be appreciated.

mariamarab commented 4 years ago

posterior-run gives an an "Argument list too long" error

GM12878_4/post/cmdline/identify/jt38.post.2f983bd2f85911eaa3717cd30ac741f6.sh: line 2: /project/6033554/arab/domain-annotations/annotations-hic/hic/egpr_test/run_egpr/chr10k/segway_posterior_fix/bin/segway-task: Argument list too long

Here is a summary of the command being run by Segway

/project/6033554/arab/domain-annotations/annotations-hic/hic/egpr_test/run_egpr/chr10k/venv_segway2/bin/segway-task run posterior GM12878_3/post/posterior/posterior%s.18.bed chr11_res1000 1743 2343 1 0 2 1 seg ../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata,../data/res1000/GM12878.res1000.genomedata asinh_norm 0,1,2,3,4,5,6,7,8,9,10,11,12 True '[(1743, 1744), (1743, 1744), (1744, 1745), (1744, 1745), (1745, 1746), (1745, 1746), (1746, 1747), (1746, 1747), (1747, 1748), (1747, 1748), (1748, 1749), (1748, 1749), (1749, 1750), (1749, 1750), (1750, 1751), (1750, 1751), (1751, 1752), (1751, 1752), (1752, 1753), (1752, 1753), (1753, 1754), (1753, 1754), (1754, 1755), (1754, 1755),
...
'[{0: 0.24712634086608887}, {1: 0.7528736591339111}, {0: 0.26267687269082285}, {1: 0.7373231273091772}, {0: 0.18879308733960726}, {1: 0.8112069126603927},
...
1 -base 3 -cCliquePrintRange 1:1 -cliqueTableNormalize 0.0 -componentCache F -cppCommandOptions '-DCARD_SEG=2 -DCARD_SUPERVISIONLABEL=-1 -DINPUT_PARAMS_FILENAME=GM12878_3/train/params/params.params -DVIRTUAL_EVIDENCE=1 -DVIRTUAL_EVIDENCE_LIST_FILENAME=VE_PLACEHOLDER -DCARD_FRAMEINDEX=2000000 -DCARD_SUBSEG=1 -DSEGTRANSITION_WEIGHT_SCALE=1.0' -deterministicChildrenStore F -doDistributeEvidence T -eCliquePrintRange 1:1 -fmt1 binary -fmt2 binary -hashLoadFactor 0.98 -inputMasterFile GM12878_3/train/params/input.master -island T -iswp1 F -iswp2 F -jtFile GM12878_3/post/log/jt_info.posterior.txt -lst 100000 -nf1 13 -nf2 0 -ni1 0 -ni2 14 -obsNAN T -of1 GM12878_3/post/observations/float32.list -of2 GM12878_3/post/observations/int.list -pCliquePrintRange 1:1 -strFile GM12878_3/train/segway.str -triFile GM12878_3/post/triangulation/segway.str.2.1.posterior.trifile -verbosity 0
mariamarab commented 4 years ago

I've been able to confirm that the virtual evidence option works for the posterior task. My tests can be found .

EricR86 commented 4 years ago

@mariamarab so previously an entire chromosome's worth of virtual evidence was passed to each job (segway-task). I've now added in a fix where only user supplied virtual evidence coordinates that overlap with the region to be trained on (or annotated or run through posterior) are given to the spawned job. This should significantly alleviate the ARG_MAX/"Argument list too long" issue.

I would really appreciate it if you would let me know if that helps.

Thanks!

EricR86 commented 3 years ago

@mariamarab a bug was fixed with higher resolutions and virtual evidence. It should resolve any issues you have with mismatched observation file size differences.

Let me know if this helps!

EricR86 commented 3 years ago

@mariamarab have you had any issues thus far? I'd like to move to a review/merge on this PR as soon as possible.

mariamarab commented 3 years ago

Yes, I have tested it and it seems to be working now.

EricR86 commented 3 years ago

@michaelmhoffman this is ready for your review