wiscstatman / EBSeq

Code for differential expression analysis from RNA-seq data....empirical Bayes...multiple groups
0 stars 0 forks source link

could the EBSeq run faster? #3

Closed jianbinwu22 closed 1 year ago

jianbinwu22 commented 1 year ago

hello, i have used the EBSeq to perform a analysis with 6 condition, isoform level, but it seems that a iteration would cost more than 12h when all pattern was accept and the the hyper-parameter estimations were not converged in 5 iterations. I wonder if you could change the code for parallel computing. And the maxround maybe a auto-parameter that auto-detect the hyper-parameter estimations. finally,the AllParti always confusing me when detecting difference expression. Thanks.

wiscstatman commented 1 year ago

Hi Kirong,

Thanks for reaching out about EBSeq. Well there is an improved version not yet in Bioconductor; let me find out its status and point you to that code.

again soon, -Michael N.

Michael Newton Professor and Chair of Biostatistics and Medical Informatics Professor of Statistics www.stat.wisc.edu/~newton/http://www.stat.wisc.edu/~newton/ [A button with "Hear my name" text for name playback in email signature]https://www.name-coach.com/michael-newton-eb256fcd-4f7d-43cf-bd51-c78f041a9593


From: Kirong @.> Sent: Tuesday, May 2, 2023 7:53 AM To: wiscstatman/EBSeq @.> Cc: Subscribed @.***> Subject: [wiscstatman/EBSeq] could the EBSeq run faster? (Issue #3)

hello, i have used the EBSeq to perform a analysis with 6 condition, isoform level, but it seems that a iteration would cost more than 12h when all pattern was accept and the the hyper-parameter estimations were not converged in 5 iterations. I wonder if you could change the code for parallel computing. And the maxround maybe a auto-parameter that auto-detect the hyper-parameter estimations. finally,the AllParti always confusing me when detecting difference expression. Thanks.

— Reply to this email directly, view it on GitHubhttps://github.com/wiscstatman/EBSeq/issues/3, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQ24SKL76MGIGXLCEZYJYLXED7U7ANCNFSM6AAAAAAXTAGCBQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

wiscstatman commented 1 year ago

Hi Kirong,

Are you using the Bioconductor version or the version at

https://github.com/wiscstatman/EBSeq

thanks -Michael N.

Michael Newton Professor and Chair of Biostatistics and Medical Informatics Professor of Statistics www.stat.wisc.edu/~newton/http://www.stat.wisc.edu/~newton/ [A button with "Hear my name" text for name playback in email signature]https://www.name-coach.com/michael-newton-eb256fcd-4f7d-43cf-bd51-c78f041a9593


From: Kirong @.> Sent: Tuesday, May 2, 2023 7:53 AM To: wiscstatman/EBSeq @.> Cc: Subscribed @.***> Subject: [wiscstatman/EBSeq] could the EBSeq run faster? (Issue #3)

hello, i have used the EBSeq to perform a analysis with 6 condition, isoform level, but it seems that a iteration would cost more than 12h when all pattern was accept and the the hyper-parameter estimations were not converged in 5 iterations. I wonder if you could change the code for parallel computing. And the maxround maybe a auto-parameter that auto-detect the hyper-parameter estimations. finally,the AllParti always confusing me when detecting difference expression. Thanks.

— Reply to this email directly, view it on GitHubhttps://github.com/wiscstatman/EBSeq/issues/3, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQ24SKL76MGIGXLCEZYJYLXED7U7ANCNFSM6AAAAAAXTAGCBQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

wiscstatman commented 1 year ago

One more thing Kirong,

Within the 'scDDboost` bioconductor package we have an accelerated version of EBSeq. See the function "EBS"

https://bioconductor.org/packages/release/bioc/html/scDDboost.html [http://bioconductor.org/images/logo/jpg/bioconductor_logo_rgb.jpg]https://bioconductor.org/packages/release/bioc/html/scDDboost.html scDDboosthttps://bioconductor.org/packages/release/bioc/html/scDDboost.html scDDboost is an R package to analyze changes in the distribution of single-cell expression data between two experimental conditions. Compared to other methods that assess differential expression, scDDboost benefits uniquely from information conveyed by the clustering of cells into cellular subtypes. Through a novel empirical Bayesian formulation it calculates gene-specific posterior probabilities that the marginal expression distribution is the same (or different) between the two conditions. The implementation in scDDboost treats gene-level expression data within each condition as a mixture of negative binomial distributions. bioconductor.org

Michael Newton Professor and Chair of Biostatistics and Medical Informatics Professor of Statistics www.stat.wisc.edu/~newton/http://www.stat.wisc.edu/~newton/ [A button with "Hear my name" text for name playback in email signature]https://www.name-coach.com/michael-newton-eb256fcd-4f7d-43cf-bd51-c78f041a9593


From: MICHAEL A NEWTON @.> Sent: Tuesday, May 2, 2023 9:08 AM To: wiscstatman/EBSeq @.>; wiscstatman/EBSeq @.> Cc: Subscribed @.> Subject: Re: [wiscstatman/EBSeq] could the EBSeq run faster? (Issue #3)

Hi Kirong,

Are you using the Bioconductor version or the version at

https://github.com/wiscstatman/EBSeq

thanks -Michael N.

Michael Newton Professor and Chair of Biostatistics and Medical Informatics Professor of Statistics www.stat.wisc.edu/~newton/http://www.stat.wisc.edu/~newton/ [A button with "Hear my name" text for name playback in email signature]https://www.name-coach.com/michael-newton-eb256fcd-4f7d-43cf-bd51-c78f041a9593


From: Kirong @.> Sent: Tuesday, May 2, 2023 7:53 AM To: wiscstatman/EBSeq @.> Cc: Subscribed @.***> Subject: [wiscstatman/EBSeq] could the EBSeq run faster? (Issue #3)

hello, i have used the EBSeq to perform a analysis with 6 condition, isoform level, but it seems that a iteration would cost more than 12h when all pattern was accept and the the hyper-parameter estimations were not converged in 5 iterations. I wonder if you could change the code for parallel computing. And the maxround maybe a auto-parameter that auto-detect the hyper-parameter estimations. finally,the AllParti always confusing me when detecting difference expression. Thanks.

— Reply to this email directly, view it on GitHubhttps://github.com/wiscstatman/EBSeq/issues/3, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQ24SKL76MGIGXLCEZYJYLXED7U7ANCNFSM6AAAAAAXTAGCBQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

jianbinwu22 commented 1 year ago

Hi Kirong, Are you using the Bioconductor version or the version at https://github.com/wiscstatman/EBSeq thanks -Michael N. Michael Newton Professor and Chair of Biostatistics and Medical Informatics Professor of Statistics www.stat.wisc.edu/~newton/<http://www.stat.wisc.edu/~newton/> [A button with "Hear my name" text for name playback in email signature]https://www.name-coach.com/michael-newton-eb256fcd-4f7d-43cf-bd51-c78f041a9593 ____ From: Kirong @.> Sent: Tuesday, May 2, 2023 7:53 AM To: wiscstatman/EBSeq @.> Cc: Subscribed @.> Subject: [wiscstatman/EBSeq] could the EBSeq run faster? (Issue #3) hello, i have used the EBSeq to perform a analysis with 6 condition, isoform level, but it seems that a iteration would cost more than 12h when all pattern was accept and the the hyper-parameter estimations were not converged in 5 iterations. I wonder if you could change the code for parallel computing. And the maxround maybe a auto-parameter that auto-detect the hyper-parameter estimations. finally,the AllParti always confusing me when detecting difference expression. Thanks. — Reply to this email directly, view it on GitHub<#3>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQ24SKL76MGIGXLCEZYJYLXED7U7ANCNFSM6AAAAAAXTAGCBQ. You are receiving this because you are subscribed to this thread.Message ID: @.>

hi, Michael Newton thank you for reminding me of that. i have just found that EBSeq in github is version 2 now. I just running the new version, it work better than previous. And to be more friendly to new users or someone poor at math, i wonder if the pattern and the posterior probability could be more simply. I only wanna konw the differrnce between group of interest when performed a 6 conditions code. But i got 203 partterns, so i had to check output of GetMultiPP carefully. i also specified the pattern wthin 10, but it posed a new question that what change between two methods? Beside the time to running, did the posterior possibility change in the same pattern? Supposed i have a 4 conditions' dataset, as the same as vignette of EBSeq at bioconductor, i only wanna compare the difference between C1 and C2, but all 15 patterns include the C1=C2 or C1≠C2. When a gene A is assigned to pattern1-pattern4, it actually the same for me, because i know the posterior possibility tell me there are no difference between C1 and C2. So should i cut off the pattern2-4? honestly, i think it's better to use all pattern,but it should be simplier for the user, especially for green hands. Thanks for replying! image

wiscstatman commented 1 year ago

Hi Kirong, I'm forwarding your email to Xiuyu Ma [watsonforfun] who may be able to assist. best of luck, -Michael N.

Michael Newton Professor and Chair of Biostatistics and Medical Informatics Professor of Statistics www.stat.wisc.edu/~newton/http://www.stat.wisc.edu/~newton/ [A button with "Hear my name" text for name playback in email signature]https://www.name-coach.com/michael-newton-eb256fcd-4f7d-43cf-bd51-c78f041a9593


From: Kirong @.> Sent: Wednesday, May 3, 2023 12:01 AM To: wiscstatman/EBSeq @.> Cc: MICHAEL A NEWTON @.>; Comment @.> Subject: Re: [wiscstatman/EBSeq] could the EBSeq run faster? (Issue #3)

Hi Kirong, Are you using the Bioconductor version or the version at https://github.com/wiscstatman/EBSeq thanks -Michael N. Michael Newton Professor and Chair of Biostatistics and Medical Informatics Professor of Statistics www.stat.wisc.edu/~newton/http://www.stat.wisc.edu/~newton/<http://www.stat.wisc.edu/~newton/%3Chttp://www.stat.wisc.edu/~newton/> [A button with "Hear my name" text for name playback in email signature]https://www.name-coach.com/michael-newton-eb256fcd-4f7d-43cf-bd51-c78f041a9593____ From: Kirong @.> Sent: Tuesday, May 2, 2023 7:53 AM To: wiscstatman/EBSeq @.> Cc: Subscribed @.> Subject: [wiscstatman/EBSeq] could the EBSeq run faster? (Issue #3https://github.com/wiscstatman/EBSeq/issues/3) hello, i have used the EBSeq to perform a analysis with 6 condition, isoform level, but it seems that a iteration would cost more than 12h when all pattern was accept and the the hyper-parameter estimations were not converged in 5 iterations. I wonder if you could change the code for parallel computing. And the maxround maybe a auto-parameter that auto-detect the hyper-parameter estimations. finally,the AllParti always confusing me when detecting difference expression. Thanks. — Reply to this email directly, view it on GitHub<#3https://github.com/wiscstatman/EBSeq/issues/3>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQ24SKL76MGIGXLCEZYJYLXED7U7ANCNFSM6AAAAAAXTAGCBQ. You are receiving this because you are subscribed to this thread.Message ID: @.>

hi, Michael Newton thank you for reminding me of that. i have just found that EBSeq in github is version 2 now. I just running the new version, it work better than previous. And to be more friendly to new users or someone poor at math, i wonder if the pattern and the posterior probability could be more simply. I only wanna konw the differrnce between group of interest when performed a 6 conditions code. But i got 203 partterns, so i had to check output of GetMultiPP carefully. i also specified the pattern wthin 10, but it posed a new question that what change between two methods? Beside the time to running, did the posterior possibility change in the same pattern? Supposed i have a 4 conditions' dataset, as the same as vignette of EBSeq at bioconductor, i only wanna compare the difference between C1 and C2, but all 15 patterns include the C1=C2 or C1≠C2. When a gene A is assigned to pattern1-pattern4, it actually the same for me, because i know the posterior possibility tell me there are no difference between C1 and C2. So should i cut off the pattern2-4? honestly, i think it's better to use all pattern,but it should be simplier for the user, especially for green hands. Thanks for replying! [image]https://user-images.githubusercontent.com/92093490/235833584-4a6430ce-1875-4444-b2e9-913f56ae00ff.png

— Reply to this email directly, view it on GitHubhttps://github.com/wiscstatman/EBSeq/issues/3#issuecomment-1532455898, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQ24SLFZYOCY6SYAR3X6VDXEHRD5ANCNFSM6AAAAAAXTAGCBQ. You are receiving this because you commented.Message ID: @.***>

xiuyuma commented 1 year ago

Hi Kirong, Are you using the Bioconductor version or the version at https://github.com/wiscstatman/EBSeq thanks -Michael N. Michael Newton Professor and Chair of Biostatistics and Medical Informatics Professor of Statistics www.stat.wisc.edu/~newton/<http://www.stat.wisc.edu/~newton/> [A button with "Hear my name" text for name playback in email signature]https://www.name-coach.com/michael-newton-eb256fcd-4f7d-43cf-bd51-c78f041a9593 ____ From: Kirong @.**> Sent: Tuesday, May 2, 2023 7:53 AM To: wiscstatman/EBSeq @.**> Cc: Subscribed @.**> Subject: [wiscstatman/EBSeq] could the EBSeq run faster? (Issue #3) hello, i have used the EBSeq to perform a analysis with 6 condition, isoform level, but it seems that a iteration would cost more than 12h when all pattern was accept and the the hyper-parameter estimations were not converged in 5 iterations. I wonder if you could change the code for parallel computing. And the maxround maybe a auto-parameter that auto-detect the hyper-parameter estimations. finally,the AllParti always confusing me when detecting difference expression. Thanks. — Reply to this email directly, view it on GitHub<#3>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQ24SKL76MGIGXLCEZYJYLXED7U7ANCNFSM6AAAAAAXTAGCBQ. You are receiving this because you are subscribed to this thread.Message ID: @.**>

hi, Michael Newton thank you for reminding me of that. i have just found that EBSeq in github is version 2 now. I just running the new version, it work better than previous. And to be more friendly to new users or someone poor at math, i wonder if the pattern and the posterior probability could be more simply. I only wanna konw the differrnce between group of interest when performed a 6 conditions code. But i got 203 partterns, so i had to check output of GetMultiPP carefully. i also specified the pattern wthin 10, but it posed a new question that what change between two methods? Beside the time to running, did the posterior possibility change in the same pattern? Supposed i have a 4 conditions' dataset, as the same as vignette of EBSeq at bioconductor, i only wanna compare the difference between C1 and C2, but all 15 patterns include the C1=C2 or C1≠C2. When a gene A is assigned to pattern1-pattern4, it actually the same for me, because i know the posterior possibility tell me there are no difference between C1 and C2. So should i cut off the pattern2-4? honestly, i think it's better to use all pattern,but it should be simplier for the user, especially for green hands. Thanks for replying! image

Hi @jianbinwu22,

For your first question, "what change between two methods" The new version EBSeq filters out those patterns that are unlikely to happen, and the posterior probabilities of the two methods are nearly identical. For more details, please refer to https://biorxiv.org/cgi/content/short/2020.06.19.162180v1

and for the second question, if you are only interested in a comparison between C1 and C2 where in total 4 conditions are present. We could sum up the posterior probabilities of all the patterns where C1 and C2 have equal means (e.g. patterns 1,2,3,4 and 9) versus the sum of the rest patterns where C1 and C2 have different means.

jianbinwu22 commented 1 year ago

Hi Kirong, Are you using the Bioconductor version or the version at https://github.com/wiscstatman/EBSeq thanks -Michael N. Michael Newton Professor and Chair of Biostatistics and Medical Informatics Professor of Statistics www.stat.wisc.edu/~newton/[http://www.stat.wisc.edu/~newton/](http://www.stat.wisc.edu/~newton/%3Chttp://www.stat.wisc.edu/~newton/) [A button with "Hear my name" text for name playback in email signature]https://www.name-coach.com/michael-newton-eb256fcd-4f7d-43cf-bd51-c78f041a9593 ____ From: Kirong @.**> Sent: Tuesday, May 2, 2023 7:53 AM To: wiscstatman/EBSeq @.**> Cc: Subscribed @.**> Subject: [wiscstatman/EBSeq] could the EBSeq run faster? (Issue #3) hello, i have used the EBSeq to perform a analysis with 6 condition, isoform level, but it seems that a iteration would cost more than 12h when all pattern was accept and the the hyper-parameter estimations were not converged in 5 iterations. I wonder if you could change the code for parallel computing. And the maxround maybe a auto-parameter that auto-detect the hyper-parameter estimations. finally,the AllParti always confusing me when detecting difference expression. Thanks. — Reply to this email directly, view it on GitHub<#3>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQ24SKL76MGIGXLCEZYJYLXED7U7ANCNFSM6AAAAAAXTAGCBQ. You are receiving this because you are subscribed to this thread.Message ID: @.**>

hi, Michael Newton thank you for reminding me of that. i have just found that EBSeq in github is version 2 now. I just running the new version, it work better than previous. And to be more friendly to new users or someone poor at math, i wonder if the pattern and the posterior probability could be more simply. I only wanna konw the differrnce between group of interest when performed a 6 conditions code. But i got 203 partterns, so i had to check output of GetMultiPP carefully. i also specified the pattern wthin 10, but it posed a new question that what change between two methods? Beside the time to running, did the posterior possibility change in the same pattern? Supposed i have a 4 conditions' dataset, as the same as vignette of EBSeq at bioconductor, i only wanna compare the difference between C1 and C2, but all 15 patterns include the C1=C2 or C1≠C2. When a gene A is assigned to pattern1-pattern4, it actually the same for me, because i know the posterior possibility tell me there are no difference between C1 and C2. So should i cut off the pattern2-4? honestly, i think it's better to use all pattern,but it should be simplier for the user, especially for green hands. Thanks for replying! image

Hi @jianbinwu22,

For your first question, "what change between two methods" The new version EBSeq filters out those patterns that are unlikely to happen, and the posterior probabilities of the two methods are nearly identical. For more details, please refer to https://biorxiv.org/cgi/content/short/2020.06.19.162180v1

and for the second question, if you are only interested in a comparison between C1 and C2 where in total 4 conditions are present. We could sum up the posterior probabilities of all the patterns where C1 and C2 have equal means (e.g. patterns 1,2,3,4 and 9) versus the sum of the rest patterns where C1 and C2 have different means.

hi,xiuyuma, Thanks for answering these qunestions. I know how to do now. how about the fdr or Multiple hypothesis testing correction? i have merged the results of GetMultiPP into a contrast-matrix. but there is a parameter of fdr for two conditions contrast in EBseq, so should i apply the multiple hypothesis testing correction (Benjamini & Hochberg)for my data by myself? Finally,I hope it can be appended to EBseq next version, which might attract more users.

xiuyuma commented 1 year ago

Hi @jianbinwu22,

Thanks for your suggestion. EBSeq controls the FDR (https://academic.oup.com/bioinformatics/article/29/8/1035/228913), no need to do further adjustment. The results from EBSeq are posterior probabilities estimated upon the sampling structure where proportions of DE patterns are pooled information across all genes. It differs from the p-values that are necessary for conducting FDR correction in the traditional multiple-testing scenario.