haowenz / chromap

Fast alignment and preprocessing of chromatin profiles
https://haowenz.github.io/chromap/
MIT License
192 stars 21 forks source link

An unknown error #142

Open xiadawei123 opened 1 year ago

xiadawei123 commented 1 year ago

Hi

I am using the software chromap developed by you for map HiC reads, but an error occurred during the alignment. I hope to get your help. The following is my code and error, thank you

conda create -n chromap_yahs -c bioconda -c conda-forge chromap samtools yahs samtools assembly-stats openjdk samtools faidx contig.fa chromap -i -r contig.fa -o index -w 14 -k 27 nohup chromap --preset hic -r contig.fa -x index --remove-pcr-duplicates -1 R1.fastq -2 R2.fastq --SAM -o aligned.sam -t 90 &

30d169a9e3bd9b70c96e1f7c8eed0b4
haowenz commented 1 year ago

It looks like some temp output files are missing. Can you check and make sure you have enough disk space for output files?

xiadawei123 commented 1 year ago

It looks like some temp output files are missing. Can you check and make sure you have enough disk space for output files?

Ok, Thank you for your prompt reply. I will increase the memory usage and try again.

xiadawei123 commented 1 year ago

It looks like some temp output files are missing. Can you check and make sure you have enough disk space for output files?

Hi, after ensuring that the server has sufficient memory (1.4T) and disk space, I reran the Hi-C alignment program from chromap. However, I am still encountering the same error. I'm not sure whether it's due to insufficient memory or some other reason. Could you provide me with some assistance? Thank you.

haowenz commented 1 year ago

As I mentioned, you have too many reads and thus you need to make sure there is enough disk space for your output. The error message indicates that you don't. Memory is not related at all.

haowenz commented 1 year ago

It would be great if you can check if you have enough disk for your output. If not, maybe delete some of your old files and make enough space for the output. Increasing memory is not helpful in this case.

haowenz commented 1 year ago

Besides, why did you use -k 27? Did the default value work?

xiadawei123 commented 1 year ago

Besides, why did you use -k 27? Did the default value work?

I have 47T disk, I think there should be enough space, is there any other reason? Since the genome is close to 10 G, I see your previous advice to others is to increase the k setting. Of course, I also used the default k parameter, but I still got the same error.

haowenz commented 1 year ago

I see. Can you run some command line to check your available disk space? I forget the exact command line. It might be "du -sh" or something else.

xiadawei123 commented 1 year ago

I see. Can you run some command line to check your available disk space? I forget the exact command line. It might be "du -sh" or something else.

Yes, I often use du-sh or df -h to check the disk space, and I reserved 47T of space for chromap Hic comparison. Thank you very much for your reply. I will run it again and finally check whether all 47T is used up

xiadawei123 commented 1 year ago

I see. Can you run some command line to check your available disk space? I forget the exact command line. It might be "du -sh" or something else.

Hi,There's plenty of disk space, so I don't think it's a disk space related problem. If you have some ideas to solve it, please let me know, thank you

haowenz commented 1 year ago

This is weird. After the run, did you check if the temporary mapping files are in the output dir? You may run "ls" and see if they are there. And can you remove "--remove-pcr-duplicates" in the command line? I guess it is not very useful for hi-c? How many sequences are there in your contig.fa files?

haowenz commented 1 year ago

Besides, can you show the beginning of your log?

xiadawei123 commented 1 year ago

This is weird. After the run, did you check if the temporary mapping files are in the output dir? You may run "ls" and see if they are there. And can you remove "--remove-pcr-duplicates" in the command line? I guess it is not very useful for hi-c? How many sequences are there in your contig.fa files?

Yes, I also find it very strange. The program successfully generated a large number of temporary files, each of which was approximately 1GB in size. Adding "--remove-pcr-duplicates" was because I needed to use the sam file obtained from "chromap" as an input for software YaHs for chromosome buliding, and YaHs emphasized in its instructions that the sam file needed to remove pcr-duplicates . As shown below, I have displayed some of the tempposrary files and the beginning of the log file. If you have any additional suggestions, please let me know in a timely manner. Thank you again for your response.

image image

haowenz commented 1 year ago

The error message indicates that chromap was trying to open a temp mapping file but nothing is found. Initially, I was assuming your disk space was full and temp mapping files were not able to be generated and thus cannot be opened. But it seems that this is not the case. From the log, I didn't see errors.

This is hard to debug on my side as it is hard for us to reproduce the error. If the dataset is publicly available, we can download it and try it. Otherwise, we have to change the code a little bit to let it generate more error message and ask you to try it again so that we can understand what exactly happened. Or you can use bwa-mem for your pipeline. It would be much much slower than Chromap in this case but it might work.

xiadawei123 commented 1 year ago

The error message indicates that chromap was trying to open a temp mapping file but nothing is found. Initially, I was assuming your disk space was full and temp mapping files were not able to be generated and thus cannot be opened. But it seems that this is not the case. From the log, I didn't see errors.

This is hard to debug on my side as it is hard for us to reproduce the error. If the dataset is publicly available, we can download it and try it. Otherwise, we have to change the code a little bit to let it generate more error message and ask you to try it again so that we can understand what exactly happened. Or you can use bwa-mem for your pipeline. It would be much much slower than Chromap in this case but it might work.

Thanks again for your timely reply, we have simultaneously used multiple methods for chromosome construction, including bwa mem. Yesterday, I replaced a server with better performance and tried to run chromap. If there is any problem, I will give you feedback in time.

huang-0323 commented 1 year ago

I have the same issue and I'm sure my disk space is enough, may I inquire if there has been any progress or resolution to the matter? I appreciate your time and assistance.

haowenz commented 1 year ago

Can you provide your log?

huang-0323 commented 1 year ago

image It creates a bunch of temp files and the log shows image my command line is nohup chromap --preset hic -r /home/data3/hsh/genome/maguan_goat_assembly/02.genome_with_hic_hifiasm/M11/M11.hic.p_ctg.fa -x /home/data3/hsh/genome/maguan_goat_assembly/02.genome_with_hic_hifiasm/M11/M11.hic.p_ctg.index --remove-pcr-duplicates -1 /home/data3/hsh/genome/maguan_goat_assembly/03.scaffold/M11/fastq/M11_hic_merge_R1.fastq.gz -2 /home/data3/hsh/genome/maguan_goat_assembly/03.scaffold/M11/fastq/M11_hic_merge_R2.fastq.gz --SAM -o M11.hic.aligned.sam -t 100 >> M11_chromap_align.log 2>&1 &

ghost commented 10 months ago

Hello, is this issue solved? I also encounterd the similar issue, and I suppose that it may be caused by the size of temp files. The memory of my server is 1.5T and the free disk size is 15T. Did you check the tempMappingFileHandle module(temp_mapping.h),maybe it's too big to handle it.

huang-0323 commented 10 months ago

Hello, is this issue solved? I also encounterd the similar issue, and I suppose that it may be caused by the size of temp files. The memory of my server is 1.5T and the free disk size is 15T. Did you check the tempMappingFileHandle module(temp_mapping.h),maybe it's too big to handle it.

not yet, I think is SAM output function has an error, other output option(--BED/--TagAlign) works fine.

haowenz commented 10 months ago

@xiadawei123 Were you able to run Chromap as you mentioned?

If any of you are using publicly available datasets, please let me know, I can try to reproduce the error. It is impossible to just debug only with these error messages.

utpala101 commented 7 months ago

@xiadawei123 Hi, have you solved the problem? I have the same problem with a relatively smaller genome with the size of 4G, and the disk size in enough to run it.

xiadawei123 commented 7 months ago

Hi, I am very sorry that I failed to solve this problem and then replaced it with other alternative software. I don't have any tips to give you. Good luck to you

Xia1191273458 @.***

 

------------------ 原始邮件 ------------------ 发件人: "haowenz/chromap" @.>; 发送时间: 2024年4月30日(星期二) 上午9:36 @.>; @.**@.>; 主题: Re: [haowenz/chromap] An unknown error (Issue #142)

@xiadawei123 Hi, have you solved the problem? I have the same problem with a relatively smaller genome with the size of 4G, and the disk size in enough to run it.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

mourisl commented 7 months ago

@utpala101 In the new version, Chromap will print an error message on which temp file it tries to open. This may help find some debug information. Did the same error occur on your data?

utpala101 commented 7 months ago

@mourisl Yes, I have the same error, and Chromap print that a temp sam file is missing, but the file is in the directory. So I don't know what's wrong with it.

utpala101 commented 7 months ago

@xiadawei123 Thank you so much for your timely reply! I will further look for some way.

mourisl commented 7 months ago

@mourisl Yes, I have the same error, and Chromap print that a temp sam file is missing, but the file is in the directory. So I don't know what's wrong with it.

What is the file name? Is it empty?

utpala101 commented 7 months ago

Sorry, I have deleted the file, but it was not empty. The file name is aligned.sam.temp1019

mourisl commented 6 months ago

Thank you for sharing the information. I think this may relate to the number of file handles a program can open on Linux machine, where the default is 1024 files. Considering the files for input and output, I think the 1019 temp files may reach the limit. We will add an option to specify the number of reads in each temp file so the number of temp files can be reduced.

mourisl commented 6 months ago

I have updated the code that will allow temp file to hold more reads when using too many temp files, though it may cause more memory usage. The updated code is in the li_dev7 branch, could you please checkout this branch and give it a try? Thank you!

utpala101 commented 6 months ago

@mourisl Sorry for delay, I have tested the new code but it still went error. The error messages are as followed


Mapped all reads in 41092.57s. Number of reads: 3393404416. Number of mapped reads: 2658488394. Number of uniquely mapped reads: 1961435054. Number of reads have multi-mappings: 697053340. Number of candidates: 580287500208. Number of mappings: 2658488394. Number of uni-mappings: 1961435054. Number of multi-mappings: 697053340. Temporary file aligned.sam.temp1019 is missing. chromap: src/temp_mapping.h:45: void chromap::TempMappingFileHandle::InitializeTempMappingLoading(uint32_t) [with MappingRecord = chromap::SAMMapping; uint32_t = unsigned int]: Assertion `file != __null' failed. Aborted (core dumped)


My work directory had 1019 temp files which matched the error line, and the temp 1019 file size is much smaller than the former temp file. (1019 temp file is 96 MB and the former is about 960 MB). I am not sure whether my data have problems, but thank you for your work!

mourisl commented 6 months ago

Thank you for the testing! It is probably still my implementation error. I'll look into it.

bioswarm commented 4 months ago

@mourisl Sorry for delay, I have tested the new code but it still went error. The error messages are as followed

Mapped all reads in 41092.57s. Number of reads: 3393404416. Number of mapped reads: 2658488394. Number of uniquely mapped reads: 1961435054. Number of reads have multi-mappings: 697053340. Number of candidates: 580287500208. Number of mappings: 2658488394. Number of uni-mappings: 1961435054. Number of multi-mappings: 697053340. Temporary file aligned.sam.temp1019 is missing. chromap: src/temp_mapping.h:45: void chromap::TempMappingFileHandle::InitializeTempMappingLoading(uint32_t) [with MappingRecord = chromap::SAMMapping; uint32_t = unsigned int]: Assertion `file != __null' failed. Aborted (core dumped)

My work directory had 1019 temp files which matched the error line, and the temp 1019 file size is much smaller than the former temp file. (1019 temp file is 96 MB and the former is about 960 MB). I am not sure whether my data have problems, but thank you for your work!

You can try the command ”ulimit -n 4096“ in the node of your cluster.

mourisl commented 4 months ago

Sorry for the delayed reply @utpala101 . The branch's code should be able to handle 20 billion reads. I have updated the code in the li_dev7 branch that should allow more reads per temp file. The branch also adds a warning message whenever the temp file volume is increased for the debugging purpose. If you are still working on the data, could you please give it a try?