morris-lab / CellOracle

This is the alpha version of the CellOracle package
Other
297 stars 51 forks source link

problem with integrate_tss_peak_with_cicero #27

Closed wangmhan closed 4 years ago

wangmhan commented 4 years ago

Hi,

I found the output after run ma.integrate_tss_peak_with_cicero is not reasonable, it only kept the ones with coaccess = 1. The number of peaks is 116374, the number of cicero_connections is 11976636. The number of TSS_annotation is 12511. After integration, only 12511 left.

I wonder galGal6_tss_info.bed.txt

step5_celloracle_step2a_tssIntegCicero.Rmd.txt

all_peaks.csv.zip

if the my custom TSS_annotation could be the reason for the abnormal output. I attached the data I used, and also the command which is from tutorial. The cicero_connection file is too large to upload...

wangmhan commented 4 years ago

Hi,

I attached the first 800000 of cicero_connection.csv. cicero_connections_test.csv.zip

KenjiKamimoto-ac commented 4 years ago

Hi wangmhan, Thank you for sending your data. I checked them. The reason for the abnormal output is because of the custom TSS_annotation.

We do not expect users to use custom TSS annotation in the current version of celloracle. Although celloracle is open-source software and you can customize it, we DO NOT provide any support for the customized code. And we do not guarantee that it will work normally if you customize the code. Please do so at your own risk and responsibility.

wangmhan commented 4 years ago

Hi Kenji,

Thanks for checking the code.

Is it possible to support chicken genome?

On Wed, 22 Jul 2020 at 14:58, KenjiKamimoto-wustl122 < notifications@github.com> wrote:

Hi wangmhan, Thank you for sending your data. I checked them. The reason for the abnormal output is because of the custom TSS_annotation.

We do not expect users to use custom TSS annotation in the current version of celloracle. Although celloracle is open-source software and you can customize it, we DO NOT provide any support for the customized code. And we do not guarantee that it will work normally if you customize the code. Please do so at your own risk and responsibility.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/morris-lab/CellOracle/issues/27#issuecomment-662437389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVYTVVY42OH7BVJCKIDOD3R43O65ANCNFSM4PESXTWA .

KenjiKamimoto-ac commented 4 years ago

I'm sorry but we can not do it now.

It is because we are using homer database to generate promoter-tss annotation and default homer annotation database does not include chicken annotation data.

Available species are below. Human (hg18, hg19, hg38), Mouse (mm8, mm9, mm10), Rat (rn4, rn5, rn6), Frog (xenTro2, xenTro3), Zebrafish (danRer7), Drosophila (dm3), C elegans (ce6, ce10), S. cerevisiae (sacCer2, sacCer3), pombe (ASM294v1), Arabidopsis (tair10), Rice (msu6).

wangmhan commented 4 years ago

Hi,

Thank you for your patience. I have one more question. I don't know if I understand the logic correct. The merge can only detect, when there is a peak in cicero_connections the same as in tss. I mean does the peak inside tss, or overlap part with tss take into account? For example, the tss of geneA is chr1_100_200. if the cicero_connections is: 1 chr1_120_180 chr1_1000_2000 0.8 2 chr1_50_180 chr1_1000_2000 0.6 3 chr1_150_280 chr1_1000_2000 0.5 4 chr1_100_200 chr1_1000_2000 0.7 Then, only the 4th raw will be selected?

If so, it seems not that flexible.

On Wed, 22 Jul 2020 at 15:12, KenjiKamimoto-wustl122 < notifications@github.com> wrote:

I'm sorry but we can not do it now.

It is because we are using homer database to generate promoter-tss annotation and default homer annotation database does not include chicken annotation data.

Available species are below. Human (hg18, hg19, hg38), Mouse (mm8, mm9, mm10), Rat (rn4, rn5, rn6), Frog (xenTro2, xenTro3), Zebrafish (danRer7), Drosophila (dm3), C elegans (ce6, ce10), S. cerevisiae (sacCer2, sacCer3), pombe (ASM294v1), Arabidopsis (tair10), Rice (msu6).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/morris-lab/CellOracle/issues/27#issuecomment-662444568, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVYTVQTNGFZEPGFMKXTZYLR43QTNANCNFSM4PESXTWA .

KenjiKamimoto-ac commented 4 years ago

Our TSS annotation function takes these cases below into consideration.

(1) a peak overlap a part with promoter-tss reference. (2) a peak exists inside with promoter-tss reference.

These calculations are done with the celloracle function, ma.get_tss_info().

Again, the reason why your cicero connection was not detected is you are using custom tss. The format of your data, tss_annotated is different from what it should be, and it is the reason why you got no cicero connection.

Also, you cannot claim that our function is not flexible because You are not using our function ma.get_tss_info(). Please use our function instead of custom function.

I recommend trying our tutorial, which includes example dataset. https://github.com/morris-lab/CellOracle/tree/master/docs/notebooks/01_ATAC-seq_data_processing/option1_scATAC-seq_data_analysis_with_cicero

It will help you understand how it work using tested dataset.

wangmhan commented 4 years ago

Thank you for the nice explanation.

Will try with the tested dataset.

On Wed, 22 Jul 2020 at 19:41, KenjiKamimoto-wustl122 < notifications@github.com> wrote:

Our TSS annotation function take these cases below in to account.

(1) a peak overlap a part with promoter-tss reference. (2) a peak exists inside with promoter-tss reference.

These calculations are done with the celloracle function, ma.get_tss_info().

Again, the reason why your cicero connection was not detected is you are using custom tss. The format of your data, tss_annotated is different from what it should be, and it is the reason why you got no cicero connection.

Also, you cannot claim that our function is not flexible because You are not using our function ma.get_tss_info(). Please use our function instead of custom function.

I recommend trying our tutorial, which includes example dataset.

https://github.com/morris-lab/CellOracle/tree/master/docs/notebooks/01_ATAC-seq_data_processing/option1_scATAC-seq_data_analysis_with_cicero

It will help you understand how it work using tested dataset.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/morris-lab/CellOracle/issues/27#issuecomment-662591150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVYTVWMVXOBEWCFNXSM3XLR44QFDANCNFSM4PESXTWA .