lh3 / pangene

Constructing a pangenome gene graph
174 stars 9 forks source link

Questions about how pangene defines genes #15

Open Fight-a-tiger-in-the-mountain opened 1 month ago

Fight-a-tiger-in-the-mountain commented 1 month ago

question2024-10-24.docx

Dear Teacher Li Heng, I am a postgraduate student. I have analyzed the Assembly of sheep database according to your process, but I have encountered some problems, so I would like to ask you for advice

  1. Regarding the definition of genes in your script, what is your definition of genes? How do you determine if this gene is in the Assembly?

Because I ran out some results according to your process, but there are some problems with this result, some of my results are shown below

image

The results showed that the ASIP gene only appeared in ASM1117029 and ASM2422226, but through blast comparison, I compared the gene in all Assembly, and the results are as follows

image

However, my personal coding ability is not good enough to fully understand the meaning of your script, so I would like to ask you what is the definition of whether this individual contains this gene in your script?

2.Regarding the interpretation of the results generated in your script, I used your code to generate part of the results. I found the ASIP gene in the gfa file, as shown in the following figure.

image

The corresponding bubble diagram has also been generated, as shown in the following figure

image

The ASIP gene was also found in the Rtab file

image

But I can't find ASIP in the bubble file

image

The same problem appears in genes such as IFNT11, GRID1 and KLHL.

Those are my main questions. The data I used is shown in the following table (the red data is the deleted data).

Awassi | ASM4054305v1 Bangladeshi_sheep | ASM3243364v1 Charollais_sheep | ASM2241674v1 Chinese_Merino_sheep | ASM2243282v1 Dorper | ASM1914517v1 East_Friesian | NWAFU_Friesian_1.0 East_Friesian_sheep | ASM3343944v1 Guide_black_fur_sheep | ASM4025935v1 Hu_sheep | ASM1117029v1 Hu_sheep | T2T-sheep1.0M hu_sheep | T2T-sheep1.0 Kazak_sheep | ASM2243284v1 Kermani_sheep | ASM2243283v1 Polled_Dorset_sheep | ASM2241691v1 Qiaoke_sheep | ASM2241668v1 Rambouillet | ARS-UI_Ramb_v3.0 Romanov_sheep | ASM2422217v1 Romney_sheep | ASM2253800v1 Suffolk_sheep | ASM2241672v1 Texel_sheep | ASM2241677v1 Tibetan_sheep | CAU_O.aries_1.0 Ujimqin_sheep | ASM2241675v1 Waggir_sheep | ASM2422226v1 White_Dorper_sheep | ASM2241669v1 Yunnan_sheep | ASM2241678v1

The protein data I used was ensemble's Sheep 2.0 data Ovis_aries_rambouillet.ARS-UI_Ramb_v2.0.pep.all.fa

The gtf file I use is the corresponding comment file Ovis_aries_rambouillet.ARS-UI_Ramb_v2.0.111.gtf

My main code is as follows

image image image image

The attachments are my gfa file, Rtab file, bubble file, gene result file and blast comparison file respectively.

I sincerely hope that you can give me some advice when you are not busy. 提问2024-10-24.docx question.zip