Open twang15 opened 2 years ago
So this link has the file with all motif probability matrices: https://www.factorbook.org/motif/human/download . You can search for the experiment ID (which is the same ID as the model) to find the motifs that were enriched in that experiment.
With wget:
wget https://screen-beta-api.wenglab.org/factorbook_downloads/complete-factorbook-catalog.meme.gz
gzip -d complete-factorbook-catalog.meme.gz
Then, separate the records into files w/ AWK https://www.gnu.org/software/gawk/manual/html_node/awk-split-records.html
https://stackoverflow.com/questions/14634349/calling-an-executable-program-using-awk
Option 1. split the meme file into many small meme files all at once:
awk '{print "touch "$2; print $0 > $2}' RS="" complete-factorbook-catalog.meme | bash
Option 2. (https://stackoverflow.com/questions/39384283/how-to-match-a-pattern-given-in-a-variable-in-awk) extract the target meme file on the fly
awk -v target="ENCSR437GBJ_TGGACTTTGRACYYW" '{if ($2 ~ target) {print "touch "$2; print $0 > $2} }' RS="" complete-factorbook-catalog.meme | bash
use single-file (https://github.com/gildas-lormeau/SingleFile), You can save web pages to HTML from the command line interface. See here for more info: https://github.com/gildas-lormeau/SingleFile/blob/master/cli/README.MD.
On my mac:
# installation
npm install puppeteer@latest
sudo npm install -g "gildas-lormeau/SingleFile#master"
Trouble-shooting If the error message UnhandledPromiseRejectionWarning: Error: Browser is not downloaded. Run "npm install" or "yarn install" at ChromeLauncher.launch is displayed, it probably means that single-file was not able to find the executable of the browser. Using the option --browser-executable-path to pass to single-file the complete path of the executable fixes this issue.
Find chrome on my mac: (https://superuser.com/questions/772131/where-is-google-chrome-located-on-a-mac)
# download complete web page as html
single-file --browser-executable-path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" https://www.factorbook.org/tf/human/ELF1/motif/ENCSR975SSR ELF1.html # this works
single-file --browser-executable-path="/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome" https://www.factorbook.org/tf/human/ELF1/motif/ENCSR975SSR ELF1.html # this does not work
# extract the motif link
grep "hq-occurrences" ELF1.html | awk -F "=" '{for(i=1; i<=NF; i++) { if ($i ~ /hq-occurrences/) {split($i, a, "\""); print a[2]; } } }'
https://screen-beta-api.wenglab.org/factorbook_downloads/hq-occurrences/ENCFF133TSU_RCTTCCGG.gz https://screen-beta-api.wenglab.org/factorbook_downloads/hq-occurrences/ENCFF133TSU_GRASCCGGAAGTGG.gz https://screen-beta-api.wenglab.org/factorbook_downloads/hq-occurrences/ENCFF133TSU_TKRCGTCAYMRGNSSGCGCC.gz
httrack is an alternative for single-file: https://alternativeto.net/software/save-page-we/
httrack --get https://www.encodeproject.org/experiments/ENCSR975SSR/ -O ELF1 -N ELF1.html
Problem statement:
We will have hundreds of models to explore, and their motif and PPM datasets on FactorBook have to be downloaded to SCG.
For example, this is the motif for ELF1. There are several files downloadable. https://www.factorbook.org/tf/human/ELF1/motif/ENCSR975SSR
But, we do not have the links to the motif files for batch downloading. Besides, we do not have the links to the motif PPM either.