Closed ykkim0127 closed 1 month ago
Hi, Sorry for the late reply. If there are no BX:Z: tags in the sam file, GraphUnzip will not be able to retrieve barcode information and that is probably the reason the program fails. Now the question is: where is the barcode information in your sam file ? If it's tagged differently than BX:Z: you can tell me and I'll add the possibility for GraphUnzip to read another tag. If the barcodes are not in the SAM file, then you need to see how barcode information is contained in the original fastq file (e.g. were the barcodes detached from the reads ? Were they tagged as BX:Z: ?). A possible explanation for the problem would be that the barcodes are not yet detached from the reads and that you need to run Longranger basic to detach them.
Hi ! I sent you some files via email.
Yes. There is no barcode tags in the sam file and also in the fastq. There is only few information about library in the header line except N:0:. Please see the attached file(2_edit.fastq). Then is it mean barcode is already detached or still included in sequence but tag is missing?
And I found hic_interactionmatic.txt is also filled with those lines same as linkedreads_interactionmatric.txt(above). Please see the attached two files (hic/linkedsreads_interactionsmatrix.txt). I thought it would be wrong output but when I run graphunzip.py unzip with this two matric files, it ends properly with final assembly.gfa and assembly.fasta without any error message. Below is a command line I used, and then I copied the message when this command finished.
./graphunzip.py unzip -g assembly_graph.gfa -i hic_interactionmatrix.txt -k linkedreads_interactionmatric.txt -l m64062_m64032_3.gaf -o 220928.gfa -f 220928.fasta
WARNING: 221 contigs out of 2417 had no coverage information or coverage=0. If this is a widespread issue, please use --conservative mode
Loading the Hi-C interaction matrix
Loading the linked-reads interaction matrix
================
Everything loaded, moving on to untangling the graph
================
*Untangling the graph using long reads*
Reading the gaf file...
Finished going through the gaf file.
Building consensus bridges from all the long reads
Done building consensus bridges
Now we will determine through an iterative process what contigs of the assembly are present only once in the final genome
Out of 1625 supposed single-copy contigs, 26 were not actually haploid. Recomputing until all the single-copy contigs are robust
Let's move on to actually untangling the graph
Now we correct the last quirks by looking a posteriori at the graph
Merging contigs that can be merged...
*Done untangling the graph using long reads*
*Untangling the graph using Hi-C*
Normalizing the interaction matrix
Finished normalizing the interaction matrix
Determining the list of all knots of the graph that I will try to solve
Finished determining the list of knots, there are 196 of them. Now determining pairs of single-copy contigs that should be linked through other contigs.
Finished matching haploid contigs, now we'll move on to determining the paths linking them
Finished determining the paths, now modifying the graph and duplicating necessary contigs
Finished round of untangling number 1 . Untangled 30261 contigs. Going on one supplementary round if 30261 > 0 and if 1 < 2
Determining the list of all knots of the graph that I will try to solve
Finished determining the list of knots, there are 77 of them. Now determining pairs of single-copy contigs that should be linked through other contigs.
Finished matching haploid contigs, now we'll move on to determining the paths linking them
Finished determining the paths, now modifying the graph and duplicating necessary contigs
Finished round of untangling number 2 . Untangled 10834 contigs. Going on one supplementary round if 10834 > 0 and if 2 < 2
Merging contigs that can be merged...
*Done untangling the graph using Hi-C*
Now exporting the result
The problem is, final assembly.fasta have more contigs than draft assembly. Please see the attached two files (draft_report.txt, unzipped_report.txt). To explain, # of contigs is increased 1,102 -> 1,544 and # of contigs (>=50000bp) is decreased 294 ->108. However, N50 value is increased 32,645,358 -> 65,251,927. Would u please explain why the contigs are fragmented even after combined with 10X and Hi-C data ? And is it related to barcode information?
Dear RolandFaure,
Thanks for replying. I attached few files for additional questions. It would be much appreciated if you could examine those results. Thanks for your help.
Best regards, Yu-kyung Kim
2022년 9월 20일 (화) 오후 4:05, RolandFaure @.***>님이 작성:
Hi, Sorry for the late reply. If there are no BX:Z: tags in the sam file, GraphUnzip will not be able to retrieve barcode information and that is probably the reason the program fails. Now the question is: where is the barcode information in your sam file ? If it's tagged differently than BX:Z: you can tell me and I'll add the possibility for GraphUnzip to read another tag. If the barcodes are not in the SAM file, then you need to see how barcode information is contained in the original fastq file (e.g. were the barcodes detached from the reads ? Were they tagged as BX:Z: ?). A possible explanation for the problem would be that the barcodes are not yet detached from the reads and that you need to run Longranger basic https://support.10xgenomics.com/genome-exome/software/pipelines/latest/what-is-long-ranger to detach them.
— Reply to this email directly, view it on GitHub https://github.com/nadegeguiglielmoni/GraphUnzip/issues/20#issuecomment-1251929278, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2VNMV7RLDCU63N67I53TXTV7FO5JANCNFSM6AAAAAAQLNIMU4 . You are receiving this because you authored the thread.Message ID: @.***>
Hi Yu-kyung, I'd love to take a look at your data. The email being sent via github, I could not see the attached files. Could you send them directly to roland.faure@irisa.fr ?
For the weird lines, I realized with your message that this is normal behavior. GraphUnzip uses pickle.dump to dump the files, thus they are not directly readable. This should not concern you. For linked reads I cannot answer you until I saw the file. However, I do not think linked reads will really be useful if you already have long reads + Hi-C. Concerning the result of GraphUnzip, I will have a look at it. What I will do (and you can too) is to re-run GraphUnzip with options -r and --dont_merge and visualize the resulting GFA on Bandage. This will give you a more precise idea of what GraphUnzip did.
Hi @ykkim0127, I've looked attentively at your results. GraphUnzip worked fine. Here is an explanation of your results:
In conclusion, the assembly you get as an output of GraphUnzip is an improved assembly with higher contiguity than the original assembly and no gaps. I hope I have been clear, don't hesitate to reply if there are still some points that remain unclear.
Roland
Hi ! I got this errors while running graphunzip.py using linked reads.
,ignoring the line, are you sure the BX:Z: tags are there ?
And then output is only filled with this lines.
And when I check barcode information from sam file (wich -C option) using grep, there is no BX:Z:. I used 10X Genomic Chromium fastq files which were downloaded from website. Is this error related to the missing barcode information ?