Closed rwiegan closed 3 years ago
I think this may be happening because the intial linking table is not overwritten but appended to, and export_from_csv.r
always reads the entire CSV. In that case, all we'd need to do is add some sort of "reset" argument to tf_analyzer.py
so generate_data.py
doesn't explicitly tell csv.r
to append its data.
Alternatively, we could delete the linking table each time an exception occurs.
Seems to me that the download needs an additional filtering step that is separated from the data in the linking table.?
I let the download finish and got the results for all chromosomes.
No with the new changes to the pipeline I again ran the following command:
python bin/tf_analyzer.py -g mm9 -b liver -t gata4 polr2a -c chr1
This runs through but gives some errors on the way:
mm9.chrom.sizes written to /mnt/workspace/rwiegan/git/jlu-bda-2020/data/download
fetching 20 ATAC/DNAse-seq experiments ...
fetching 6 ChIP-seq experiments ...
kept 20 ATAC/DNAse-seq experiments
0 lines added to /mnt/workspace/rwiegan/git/jlu-bda-2020/data/download/linking_table.csv
creating queue ...
No new files to download.
No new data was downloaded, skipping validation, merging and sorting.
------ Reading in linkage table ------
------ Now starting normalisation process. 22 files will be normalised ------
------ Log scaling files ------
------ Finding global min/max values ------
- Checking file 1 of 22
- Checking file 2 of 22
- Checking file 3 of 22
- Checking file 4 of 22
- Checking file 5 of 22
- Checking file 6 of 22
- Checking file 7 of 22
- Checking file 8 of 22
- Checking file 9 of 22
- Checking file 10 of 22
- Checking file 11 of 22
- Checking file 12 of 22
- Checking file 13 of 22
- Checking file 14 of 22
- Checking file 15 of 22
- Checking file 16 of 22
- Checking file 17 of 22
- Checking file 18 of 22
- Checking file 19 of 22
- Checking file 20 of 22
- Checking file 21 of 22
- Checking file 22 of 22
------ Min-max scaling files -------
- Scaling file 1 of 22.
- Scaling file 2 of 22.
- Scaling file 3 of 22.
- Scaling file 4 of 22.
- Scaling file 5 of 22.
- Scaling file 6 of 22.
- Scaling file 7 of 22.
- Scaling file 8 of 22.
- Scaling file 9 of 22.
- Scaling file 10 of 22.
- Scaling file 11 of 22.
- Scaling file 12 of 22.
- Scaling file 13 of 22.
- Scaling file 14 of 22.
- Scaling file 15 of 22.
- Scaling file 16 of 22.
- Scaling file 17 of 22.
- Scaling file 18 of 22.
- Scaling file 19 of 22.
- Scaling file 20 of 22.
- Scaling file 21 of 22.
- Scaling file 22 of 22.
22 of 22files were successfully normalised. If not all files were normalised, check logging for further information.
Analyzing: liver
Analyzing: polr2a
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr2.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr3.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr4.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr6.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr7.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr10.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr11.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr12.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr13.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr16.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001LND.chr19.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr2.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr3.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr4.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr6.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr7.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr10.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr11.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr12.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr13.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr16.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/rwiegan/git/jlu-bda-2020/data/mm9/liver/chip-seq/polr2a/ENCFF001YAN.chr19.bw
Finished analysis of polr2a
Analyzing: gata4
Finished analysis of gata4
Finished analysis of liver
analyse_main.py: unpacking
unpacking: liver
analysing: polr2a
I'm not sure what went wrong with your last run, but the main issue should be resolved now. The download script is now called with the same arguments as the CSV script. Maybe that also resolves your new problem.
I have the same errors see #40
With the newest version, I still get the Unable to open file errors.
@hschult @rwiegan Could you test again if the issue still persists? There have been some fixes pushed that might resolve this issue.
See #40. This should have been resolved by ensuring no malformed files pass through the pipeline.
I called the script with following parameters:
python bin/tf_analyzer.py -g mm9 -b liver -t gata4 polr2a
This called failed due to a connection lost to DeepBlue.
Because this downloads quite a few files I change my mind and only wanted to download chr1.
Calling
python bin/tf_analyzer.py -g mm9 -b liver -t gata4 polr2a -c chr1
still started downloading all chromosomes. Seems like it saves the request from the last call.