Closed lancelot-bio closed 2 months ago
Please show me your commands.
Hi,
for haphic:
~/01_Tools/HapHiC/haphic pipeline filtered.curated.fasta HiC.filtered.bam 10 --threads 88 --skip_allhic --correct_nrounds 2
for juicer.sh:
ln -s ~/01_PrimaryContig/test_Haphic/filtered.curated.fasta . samtools faidx filtered.curated.fasta ~/01_Tools/HapHiC/scripts/../utils/juicer pre -a -q 1 -o out_JBAT ~/01_PrimaryContig/test_Haphic/HiC.filtered.bam scaffolds.raw.agp filtered.curated.fasta.fai >out_JBAT.log 2>&1 (java -jar -Xmx32G ~/01_Tools/HapHiC/scripts/../utils/juicer_tools.1.9.9_jcuda.0.8.jar pre out_JBAT.txt out_JBAT.hic.part <(cat out_JBAT.log | grep PRE_C_SIZE | awk '{print $2" "$3}')) && (mv out_JBAT.hic.part out_JBAT.hic)
for juicer post:
~/01_Tools/HapHiC/utils/juicer post -o out_JBAT out_JBAT.review.assembly out_JBAT.liftover.agp scaffolds.fa
Thanks.
The command for juicer post
should be:
~/01_Tools/HapHiC/utils/juicer post -o out_JBAT out_JBAT.review.assembly out_JBAT.liftover.agp filtered.curated.fasta
Hi,
Ye, I found the problem, and it posted another error:
[E::assembly_to_agp] coordinates conversion error ptg000032l 0
Here is my agp and assembly file:
out_JBAT.zip
Thanks.
The contig IDs in the assembly file should start with ctg
(contig names converted by juicer pre). However, the first 18 contig IDs start with ptg
(original contig names). Did you modify the assembly file yourself?
Hi, Yes, when I realized the problem may related to the name, I tried to modify them, and, eventually, I forgot to reverse these actions.
Many thanks.
I woud like to ask you too because I have generated the files with the --quick_mode and so It seems I encountered the same error as you.
Do you think that I could yahs/agp_to_fasta.py into juicer post -a ?
Thank you for the answer. Quentin
It seems to be related to how we load the assemblies in JBAT because my scaffolds went from ctg to group1 because I have used a custom script to get the green contig bins maybe the other people did too or did you solve it ?
https://github.com/ShunOuchi/GreenHill/blob/main/utils/fasta_to_juicebox_assembly.py
I solved it using the script in the juicebox_scripts/ folder. Thanks again
@Isoris Hi, I'm running into this same issue. What script did you end up using?
So what your have to do is to go into the yahs directory and there will be a ditectory juicebox scripts and inside another juicebox scripts
Then in this directory there is like generate fasta from assembly or something like that. You need to first generate the agp file of your sequence. then the contig names should match the assembly file contig names and then you can generate the fasta... Let me search it again for you. I don't remember precisely...
On Thu, Nov 7, 2024, 3:58 AM bl24 @.***> wrote:
@Isoris https://github.com/Isoris Hi, I'm running into this same issue. What script did you end up using?
— Reply to this email directly, view it on GitHub https://github.com/zengxiaofei/HapHiC/issues/63#issuecomment-2460764728, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TG4OZZJ7BLDZZQ6RF3Z7J7F3AVCNFSM6AAAAABN54YSU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRQG43DINZSHA . You are receiving this because you were mentioned.Message ID: @.***>
But that worked because I renamed the contigs of the assembly file of juicebox manually with sed prior to using the convert to fasta script. And I could guess the names of the scaffold from their order because I didn't move them (reorder them like 1 2 3 4) but I kept the order of the Blue scaffold intact as in the initial ( non reviewed) assembly file. This way I could rename the reviewed file with the names in ghe AGP file. And use both to make the fasta.
Are you referring to juicebox_assembly_converter.py? If found that from the juicebox_scripts repository: https://github.com/phasegenomics/juicebox_scripts
Yes !! This one. Does it work?
On Fri, Nov 8, 2024, 2:12 AM bl24 @.***> wrote:
Are you referring to juicebox_assembly_converter.py? If found that from the juicebox_scripts repository: https://github.com/phasegenomics/juicebox_scripts
— Reply to this email directly, view it on GitHub https://github.com/zengxiaofei/HapHiC/issues/63#issuecomment-2463020797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TBKUB5HGKWYSKP7WBDZ7O3RXAVCNFSM6AAAAABN54YSU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRTGAZDANZZG4 . You are receiving this because you were mentioned.Message ID: @.***>
You need to use the .review file from juicebox as assembly for the script.
On Fri, Nov 8, 2024, 2:23 AM Quentin Andrès @.***> wrote:
Yes !! This one. Does it work?
On Fri, Nov 8, 2024, 2:12 AM bl24 @.***> wrote:
Are you referring to juicebox_assembly_converter.py? If found that from the juicebox_scripts repository: https://github.com/phasegenomics/juicebox_scripts
— Reply to this email directly, view it on GitHub https://github.com/zengxiaofei/HapHiC/issues/63#issuecomment-2463020797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TBKUB5HGKWYSKP7WBDZ7O3RXAVCNFSM6AAAAABN54YSU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRTGAZDANZZG4 . You are receiving this because you were mentioned.Message ID: @.***>
I did try it but I get this error:
Checking for breaks listed in .assembly and making them...
Traceback (most recent call last):
File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 872, in
Do you have any debris in your assembly? Or fragments? Something like :::debris
It is because this script cannot accommodate with small debris you need maybe to copy your assembly file and to delete lines with debris
On Fri, Nov 8, 2024, 3:14 AM bl24 @.***> wrote:
I did try it but I get this error:
Checking for breaks listed in .assembly and making them... Traceback (most recent call last): File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 872, in processed_assembly = JuiceboxConverter().process(fasta, assembly, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 91, in process sequences = self._add_breaks(sequences, assembly_map) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 300, in _add_breaks raise BadContigNameError("Unbroken contig {} failed to map!!".format(fragment_name)) BadContigNameError: Unbroken contig ctg00000001.1 failed to map!!
— Reply to this email directly, view it on GitHub https://github.com/zengxiaofei/HapHiC/issues/63#issuecomment-2463126648, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TEJ7V7OXCUDNEFZ6YDZ7PC2NAVCNFSM6AAAAABN54YSU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRTGEZDMNRUHA . You are receiving this because you were mentioned.Message ID: @.***>
The names of your contigs in your fasta should be the same as in Your assembly file
On Fri, Nov 8, 2024, 3:14 AM bl24 @.***> wrote:
I did try it but I get this error:
Checking for breaks listed in .assembly and making them... Traceback (most recent call last): File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 872, in processed_assembly = JuiceboxConverter().process(fasta, assembly, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 91, in process sequences = self._add_breaks(sequences, assembly_map) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 300, in _add_breaks raise BadContigNameError("Unbroken contig {} failed to map!!".format(fragment_name)) BadContigNameError: Unbroken contig ctg00000001.1 failed to map!!
— Reply to this email directly, view it on GitHub https://github.com/zengxiaofei/HapHiC/issues/63#issuecomment-2463126648, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TEJ7V7OXCUDNEFZ6YDZ7PC2NAVCNFSM6AAAAABN54YSU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRTGEZDMNRUHA . You are receiving this because you were mentioned.Message ID: @.***>
Compare the initial assembly file ( before juicebox) to the final review file. Are the names matching ? Next are the review file contig names the same as the review assembly file ?
Next, remove lines in the final review file that have debris on it. And lines that end with 0 change to 1 to indicate its 1 nucleotide long for the contig and not 0 ( sometimes Juicebox can bug and when you cut at a certain zoom and have the camera locked it will cut the contig and generate a small contig that has a length of 0... )
On Fri, Nov 8, 2024, 3:18 AM Quentin Andrès @.***> wrote:
The names of your contigs in your fasta should be the same as in Your assembly file
On Fri, Nov 8, 2024, 3:14 AM bl24 @.***> wrote:
I did try it but I get this error:
Checking for breaks listed in .assembly and making them... Traceback (most recent call last): File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 872, in processed_assembly = JuiceboxConverter().process(fasta, assembly, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 91, in process sequences = self._add_breaks(sequences, assembly_map) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 300, in _add_breaks raise BadContigNameError("Unbroken contig {} failed to map!!".format(fragment_name)) BadContigNameError: Unbroken contig ctg00000001.1 failed to map!!
— Reply to this email directly, view it on GitHub https://github.com/zengxiaofei/HapHiC/issues/63#issuecomment-2463126648, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TEJ7V7OXCUDNEFZ6YDZ7PC2NAVCNFSM6AAAAABN54YSU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRTGEZDMNRUHA . You are receiving this because you were mentioned.Message ID: @.***>
Okay, I removed the debris but then it gave me a different error: MissingFragmentError: Assembly out_JBAT.review2.assembly is missing the sequence for index 19
I'm confused about the contigs in my fasta needing to be the same as the assembly. Isn't the purpose of the liftover.agp file to have a conversion between the assembly and the fasta? I ask because I've able to make juicer post work on other assemblies that didn't use quick_mode.
Put nack the debris lines and the lines that you removed and make sure that the contig names in the scaffold.assembly ( before juicebox ) are the same as the .review.assembly
And the same as the fasta
Yes there's a bug that breaks the review file when we add fragments let me check my bash History
On Fri, Nov 8, 2024, 3:36 AM bl24 @.***> wrote:
Okay, I removed the debris but then it gave me a different error: MissingFragmentError: Assembly out_JBAT.review2.assembly is missing the sequence for index 19
I'm confused about the contigs in my fasta needing to be the same as the assembly. Isn't the purpose of the liftover.agp file to have a conversion between the assembly and the fasta? I ask because I've able to make juicer post work on other assemblies that didn't use quick_mode.
— Reply to this email directly, view it on GitHub https://github.com/zengxiaofei/HapHiC/issues/63#issuecomment-2463164262, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TF4F3SPTPFHDEZAK73Z7PFNVAVCNFSM6AAAAABN54YSU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRTGE3DIMRWGI . You are receiving this because you were mentioned.Message ID: @.***>
Comparing my original assembly with my review assembly and I noticed there are also things labeled "fragments" in my review file
ctg00000018.1:::fragment_1 18 2575032 ctg00000018.1:::fragment_2:::debris 19 775000 ctg00000018.1:::fragment_3 20 424305 ctg00000019.1:::fragment_1 21 237691 ctg00000019.1:::fragment_2:::debris 22 375000 ctg00000019.1:::fragment_3 23 7350695
I don't understand why you have debris ? In my opinion you can maybe try to remove chromosome boundaries and then move to debris?
Like in JBAT the debris are the unwanted contigs. Usually you remove the chromosome boundaries ?
The convert script cannot accomodate with the :::debris I think.
Thats why I cannot really cut and rebuild thr fasta it's like a bug?
On Fri, Nov 8, 2024, 3:44 AM bl24 @.***> wrote:
Comparing my original assembly with my review assembly and I noticed there are also things labeled "fragments" in my review file
ctg00000018.1:::fragment_1 18 2575032 ctg00000018.1:::fragment_2:::debris 19 775000 ctg00000018.1:::fragment_3 20 424305 ctg00000019.1:::fragment_1 21 237691 ctg00000019.1:::fragment_2:::debris 22 375000 ctg00000019.1:::fragment_3 23 7350695
— Reply to this email directly, view it on GitHub https://github.com/zengxiaofei/HapHiC/issues/63#issuecomment-2463175874, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TAUZCMMWDXJX6MHW53Z7PGKDAVCNFSM6AAAAABN54YSU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRTGE3TKOBXGQ . You are receiving this because you were mentioned.Message ID: @.***>
But that worked because I renamed the contigs of the assembly file of juicebox manually with sed prior to using the convert to fasta script. And I could guess the names of the scaffold from their order because I didn't move them (reorder them like 1 2 3 4) but I kept the order of the Blue scaffold intact as in the initial ( non reviewed) assembly file. This way I could rename the reviewed file with the names in ghe AGP file. And use both to make the fasta.
Here is what my .review file looks like
Comparing my original assembly with my review assembly and I noticed there are also things labeled "fragments" in my review file
ctg00000018.1:::fragment_1 18 2575032 ctg00000018.1:::fragment_2:::debris 19 775000 ctg00000018.1:::fragment_3 20 424305 ctg00000019.1:::fragment_1 21 237691 ctg00000019.1:::fragment_2:::debris 22 375000 ctg00000019.1:::fragment_3 23 7350695
The fragments in the review file are normal they are the green contigs squares in Juicebox. While the contigs ( on the left ) are the blue scaffolds in juicebox.
Your file seems normal. And what about your fasta ?
Can you grep the header names?
It's probably a matter of names or maybe it's related to the .1 ? After ctg ? The fasta file from which the .assembly file originates needs to be the one used in the script. Or maybe becauae your first contig has a problem or no fragment or ? Can you please show the top of your .review file?
Also maybe you need to change the file to
>ctg instead of ctg
I did try it but I get this error:
Checking for breaks listed in .assembly and making them... Traceback (most recent call last): File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 872, in
processed_assembly = JuiceboxConverter().process(fasta, assembly, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 91, in process sequences = self._add_breaks(sequences, assembly_map) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/wing2/users/bmlau/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 300, in _add_breaks raise BadContigNameError("Unbroken contig {} failed to map!!".format(fragment_name)) BadContigNameError: Unbroken contig ctg00000001.1 failed to map!!
Okay, I realize the reason why I was getting errors is because I was using a different fasta file. Instead of using the original fasta input for the HapHic pipeline I used the corrected_asm.fa for quick_view mode after the pipeline gave me the message:
"It seems that some chromosomes were grouped together (length ratio = 0.5) You could check whether the parameters used are correct / appropriate and then try to tune the parameters for assembly correction, contig / Hi-C link filtration, or Markov clustering"
Hi, After scaffolding, I followed the recommendation, executed the juicebox.sh, and used the out_JBAT.hic and out_JBAT.assembly to obtain a modified assembly in juicebox. However, when I ran the juicer post command, it posted an error:
[E::make_asm_dict_from_agp] sequence ptg000032l not found
I checked other files, 'ptg000032l' was presented in related files but the out_JBAT.review.assembly file. Is this error raised due to the changed name in out_JBAT.assembly?Thanks