Closed hschult closed 3 years ago
generate_data.py
skips the validation, sorting and pickle creation steps when no new data was downloaded. So if an error occurred during that process in a previous run, the pipeline will attempt to skip these steps again and always fail. @Jasmin-Walter addressed this in 15db5ef472973f291f23ef09984589b7370a9c2e by adding a --redo_file_validation
argument but it still needs to be implemented in generate_data.py
.
The problem still persists. I tried to run with the --redo_analysis
python bin/tf_analyzer.py -g mm9 -b liver -t gata4 polr2a -c chr1 --redo_analysis
fetching 20 ATAC/DNAse-seq experiments ...
fetching 6 ChIP-seq experiments ...
kept 20 ATAC/DNAse-seq experiments
0 lines added to /mnt/workspace/rwiegan/git/jlu-bda-2020/data/download/linking_table.csv
creating queue ...
No new files to download.
No new data was downloaded, skipping validation, merging and sorting.
Reading in linkage table.
Now starting normalisation process. 0 files will be normalised.
------ Log scaling files ------
------ Finding global min/max values ------
------ Min-max scaling all files -------
Warning: No files were normalised. Please check logging for further information.
Traceback (most recent call last):
File "bin/tf_analyzer.py", line 261, in <module>
main()
File "bin/tf_analyzer.py", line 234, in main
scores, exist = scripts.score.findarea(args.width, args.genome.lower(), [x.lower() for x in args.biosource],
File "/mnt/workspace/rwiegan/git/jlu-bda-2020/bin/scripts/score.py", line 33, in findarea
atacdict = pickle.load(open(os.path.join(picklepath, genom, 'atac-seq', biosource + ".pickle"), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/workspace/rwiegan/git/jlu-bda-2020/data/pickledata/mm9/atac-seq/liver.pickle'
I also tried the --check_local_files parameter.
python bin/tf_analyzer.py -g mm9 -b liver -t gata4 polr2a -c chr1 --check_local_files /mnt/workspace/rwiegan/git/jlu-bda-2020/data/
fetching 20 ATAC/DNAse-seq experiments ...
fetching 6 ChIP-seq experiments ...
kept 20 ATAC/DNAse-seq experiments
0 lines added to /mnt/workspace/rwiegan/git/jlu-bda-2020/data/download/linking_table.csv
creating queue ...
No new files to download.
No new data was downloaded, skipping validation, merging and sorting.
Reading in linkage table.
Now starting normalisation process. 0 files will be normalised.
------ Log scaling files ------
------ Finding global min/max values ------
------ Min-max scaling all files -------
Warning: No files were normalised. Please check logging for further information.
Traceback (most recent call last):
File "bin/tf_analyzer.py", line 261, in <module>
main()
File "bin/tf_analyzer.py", line 234, in main
scores, exist = scripts.score.findarea(args.width, args.genome.lower(), [x.lower() for x in args.biosource],
File "/mnt/workspace/rwiegan/git/jlu-bda-2020/bin/scripts/score.py", line 33, in findarea
atacdict = pickle.load(open(os.path.join(picklepath, genom, 'atac-seq', biosource + ".pickle"), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/workspace/rwiegan/git/jlu-bda-2020/data/pickledata/mm9/atac-seq/liver.pickle'
--redo_file_validation
is the correct argument to use here, as I mentioned before. You may also need to update your conda environment with the latest environment.yml
, since we added a different version of rename
with the util-linux
dependency.
Ah, I missed this parameter. With --redo_file_validation it worked.
Ok, the error seems to be fixed now. But still, there are some warnings and errors you want to look at. I prepared a summary below.
No new files to download.
validating files
Error - overlapping regions in bedGraph line 8 of /mnt/workspace/hschult/jlu-bda-2020/data/temp/ENCFF001YAM.chr5.bedgraph
Error - overlapping regions in bedGraph line 53 of /mnt/workspace/hschult/jlu-bda-2020/data/temp/ENCFF001YAM.chr7.bedgraph
Error - overlapping regions in bedGraph line 17 of /mnt/workspace/hschult/jlu-bda-2020/data/temp/ENCFF001YAM.chr10.bedgraph
Error - overlapping regions in bedGraph line 14 of /mnt/workspace/hschult/jlu-bda-2020/data/temp/ENCFF001YAM.chr11.bedgraph
Error - overlapping regions in bedGraph line 5 of /mnt/workspace/hschult/jlu-bda-2020/data/temp/ENCFF001YAM.chr13.bedgraph
...
unrecognized file format
unexpected file: ENCFF002ADS.chr1 chr1
unrecognized file format
unexpected file: ENCFF002ADS.chr2 chr2
unrecognized file format
unexpected file: ENCFF002ADS.chr3 chr3
unrecognized file format
unexpected file: ENCFF002ADS.chr4 chr4
unrecognized file format
unexpected file: ENCFF002ADS.chr5 chr5
unrecognized file format
unexpected file: ENCFF002ADS.chr6 chr6
unrecognized file format
unexpected file: ENCFF002ADS.chr7 chr7
unrecognized file format
....
sorting files
{'ctcf', 'polr2a', 'gata4'}
{'liver'}
Reading in linkage table.
Now starting normalisation process. 456 files will be normalised.
------ Log scaling files ------
- Log-scaling file 1 of 456: /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr1.bw
- Log-scaling file 2 of 456: /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr2.bw
- Log-scaling file 3 of 456: /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr3.bw
- Log-scaling file 4 of 456: /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr4.bw
- Log-scaling file 5 of 456: /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr5.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
- File /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr5.bw could not be normalized. Please check logging for further info.
- Log-scaling file 6 of 456: /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr6.bw
- Log-scaling file 7 of 456: /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr7.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
- File /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr7.bw could not be normalized. Please check logging for further info.
...
418 of 456files were successfully normalised. If not all files were normalised, check logging for further information.
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr5.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr7.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr10.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
Unable to open file /mnt/workspace/hschult/jlu-bda-2020/data/mm9/liver/chip-seq/ctcf/ENCFF001YAM.chr11.bw
[bwHdrRead] There was an error while reading in the header!
[pyBwOpen] bw is NULL!
...
This issue has been fixed. Wrongly formatted files were able to pass the pipeline.
Call:
python bin/tf_analyzer.py -g mm9 -b liver
Error:
There are no pickle files in the folder given in the error message.