jkimlab / DESCHRAMBLER

18 stars 8 forks source link

Whether we can input contig-level genomes? #24

Open huixianxie opened 1 month ago

huixianxie commented 1 month ago

Hello, I would like to inquire about the input data format. When preparing the config.SFs file, I observed that it is possible to input non-chromosomal data. Consequently, I prepared two datasets. The first dataset includes five species, comprising 2 with chromosome-level genomes and 3 with scaffold-level genomes. The analysis proceeded normally, and ancestral karyotypes were successfully reconstructed. Building on the first dataset, the second dataset incorporated contig-level genome data from four additional species. This resulted in the identification of 781 APCFs, but no ancestral karyotypes were constructed. All datasets were analyzed using a 1,000,000 (bp) block resolution. I am seeking clarification on whether the results from the second dataset are accurate. Have you or any colleagues encountered a similar situation?

Many thanks, I appreciate your prompt response. Huixian Xie

jkimlab commented 1 month ago

The resolution value may be too large for your second dataset because it is highly fragmented. You may try again using a smaller resolution value.

On May 30, 2024, at 4:45 PM, huixianxie @.***> wrote:

Hello, I would like to inquire about the input data format. When preparing the config.SFs file, I observed that it is possible to input non-chromosomal data. Consequently, I prepared two datasets. The first dataset includes five species, comprising 2 with chromosome-level genomes and 3 with scaffold-level genomes. The analysis proceeded normally, and ancestral karyotypes were successfully reconstructed. Building on the first dataset, the second dataset incorporated contig-level genome data from four additional species. This resulted in the identification of 781 APCFs, but no ancestral karyotypes were constructed. All datasets were analyzed using a 1,000,000 (bp) block resolution. I am seeking clarification on whether the results from the second dataset are accurate. Have you or any colleagues encountered a similar situation?

Many thanks, I appreciate your prompt response. Huixian Xie

— Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV72VOOB6H7EMPW6KXDZE3KJBAVCNFSM6AAAAABIQMC6RSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDIOJUHEYDMMY. You are receiving this because you are subscribed to this thread.

huixianxie commented 1 month ago

Dear all members of jkimlab,

Thanks for your prompt and helpful response.

I have conducted three separate analyses on my second dataset using different block resolutions. The results are as follows: (a) 1,715 APCFs at a 300,000 bp resolution; (b) 267 APCFs at a 500,000 bp resolution; and (c) 781 APCFs at a 1,000,000 bp resolution.

Upon attempting to reconstruct the ancestral karyotypes, the process generates a FASTA file. This file contains several contig data entries, with the number of contigs seemingly corresponding to the number of APCFs. I am curious to know if this is the expected outcome or if there might be an error in my approach.

I came across a statement in your published article that suggested I could input the contig-level genome for analysis. The sentence reads: "For ancestral chromosome reconstructions, we developed an algorithm (DESCHRAMBLER) that probabilistically determines the adjacencies of syntenic fragments using chromosome-scale and fragmented genome assemblies."

I am interpreting this to mean that the "fragmented genome" includes both scaffold and contig data. Is this understanding correct? Does the contig genome configuration file need to specify the karyotype when it is used?

I appreciate your expertise and guidance on this matter.

Best regards, Huixian Xie

jkimlab commented 1 month ago

You have to use much smaller numbers for resolution, such as 10,000 bp, or 1,000 bp depending on N50 of your contig assembly. What is the N50 of your contig assembly? If the contig assembly is highly fragmented, that is the N50 is too small, I don’t recommend using the contig assembly in the reconstruction.

On May 30, 2024, at 9:46 PM, huixianxie @.***> wrote:

Dear all members of jkimlab,

Thanks for your prompt and helpful response.

I have conducted three separate analyses on my second dataset using different block resolutions. The results are as follows: (a) 1,715 APCFs at a 300,000 bp resolution; (b) 267 APCFs at a 500,000 bp resolution; and (c) 781 APCFs at a 1,000,000 bp resolution.

Upon attempting to reconstruct the ancestral karyotypes, the process generates a FASTA file. This file contains several contig data entries, with the number of contigs seemingly corresponding to the number of APCFs. I am curious to know if this is the expected outcome or if there might be an error in my approach.

I came across a statement in your published article that suggested I could input the contig-level genome for analysis. The sentence reads: "For ancestral chromosome reconstructions, we developed an algorithm (DESCHRAMBLER) that probabilistically determines the adjacencies of syntenic fragments using chromosome-scale and fragmented genome assemblies."

I am interpreting this to mean that the "fragmented genome" includes both scaffold and contig data. Is this understanding correct? Does the contig genome configuration file need to specify the karyotype when it is used?

I appreciate your expertise and guidance on this matter.

Best regards, Huixian Xie

— Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/24#issuecomment-2139479608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV7GOLCXFBR47P2XZDDZE4NRFAVCNFSM6AAAAABIQMC6RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZZGQ3TSNRQHA. You are receiving this because you commented.

huixianxie commented 1 month ago

Thanks! I will try as soon as possible, and feedback results for you~

huixianxie commented 3 weeks ago

Is the genome of the reconstructed ancestral karyotype contingent upon the quantity of Ancestral Pseudo-Consensus Fragments (APCFs)? If there are 25 APCFs, would this result in the generation of a FASTA genome file with more than 25 initial sequences? If there are 40 APCFs, would this result in the generation of a FASTA genome file with more than 40 initial sequences? By stating that no ancestral karyotype had been constructed previously, I was referring to the fact that the genome file contained FASTA sequences that corresponded directly to the count of Ancestral Pseudo-Consensus Fragments (APCFs). My 4 contigs level genomes as follows: N50: 3312782 N50: 3102134 N50: 4251547 N50: 93890545

jkimlab commented 3 weeks ago

I can’t completely understand your question. But APCFs represent the structure of ancestral sequences. They are constructed by joining syntenic fragments obtained from given genomes.

On Jun 3, 2024, at 10:53 PM, huixianxie @.***> wrote:

Is the genome of the reconstructed ancestral karyotype contingent upon the quantity of Ancestral Pseudo-Consensus Fragments (APCFs)? If there are 25 APCFs, would this result in the generation of a FASTA genome file with more than 25 initial sequences? If there are 40 APCFs, would this result in the generation of a FASTA genome file with more than 40 initial sequences? By stating that no ancestral karyotype had been constructed previously, I was referring to the fact that the genome file contained FASTA sequences that corresponded directly to the count of Ancestral Pseudo-Consensus Fragments (APCFs). My 4 contigs level genomes as follows: N50: 3312782 N50: 3102134 N50: 4251547 N50: 93890545

— Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/24#issuecomment-2145264252, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV3GANFAF5DWYAB5N53ZFRYNJAVCNFSM6AAAAABIQMC6RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBVGI3DIMRVGI. You are receiving this because you commented.