ncats / RaMP-DB

28 stars 13 forks source link

Biological Pathway Enrichment Issues with Web GUI #58

Closed jnstyles5 closed 7 months ago

jnstyles5 commented 1 year ago

Hi, I am analyzing around 115 HMDB numbers from metabolites identified in blood samples and was trying to perform biological pathway enrichment. However, the biological pathway enrichment only completes when no sample type is selected and no p-values are ever calculated, even with a subset of data. Additionally, I cannot download the output for the full dataset, only a subset. I am currently attempting to switch to running this analysis in R on my local device. Will that help? Any other suggestions? Thanks for your help!

johnbraisted commented 1 year ago

Hello, We've recently found that choosing a biospecimen and certain relaxed default pathway clustering parameters can cause the pathway clustering to essentially time-out and not return a clustering image or even a full result.

This issue and the download issue you describe are both patched in our current RaMP package which is currently deployed to our development website but not yet deployed to our production website. I'll contact our IT to deploy to production and coordinate that deployment.

If you have familiarity with R and are able to install a local MySQL instance of the database as described in our README, that will work for you. Let me check with our IT to see if we can schedule that production release to our web interface. That would be the best solution and easier on you. I'm going to keep this issue open until we deploy the solution.

Thank you for the report.

jnstyles5 commented 1 year ago

Hello John,

Thank you so much! I have familiarity with R and was able to install the package which ultimately worked great!

I did have some issues with the install, specifically SQL, that I wanted to make you aware of that might be helpful for others. I used these instructions: https://github.com/ncats/RaMP-DB. I installed mysql 5.7 because I think that was the version used to generate the dataset, but it might have been the client software version and I just misunderstood. I had no prior familiarity with SQL so that could have contributed to my issues. The main change that I figured out was using [mysql> source /your/file/path/here/ramp_.sql] instead of [> mysql -u root -p ramp < /your/file/path/here/ramp_.sql] to load the file, the later produced an SQL syntax error (ERROR 1064(42000)). I also noticed there were 17 files instead of the 12 files listed in the instructions. Overall the instructions were actually really clear and easy to follow, with the exception of this step, which didn't work for me.

I also wanted to double check that runCombinedFisherTest() is the correct function to perform the biological pathway enrichment like we would in the web version? The results seem correct, but the documentation was a little unclear, especially compared to the chemicalClassEnrichment() documentation and function naming.

Thank you for your help!

johnbraisted commented 1 year ago

Hi,

I was on holiday until today, so just saw your message.

Thanks for working with the R package while we sort out the website issues. Also, thank you for the details on the configuration/setup issues you ran into. As a side note, we are working on an SQLite version that will streamline this process. SQLite is a different database management system that works based on a single file that you would have on your system. RaMP will pull in the latest and save a copy. The RaMP package will reference this file. It's still in the works.

runCombinedFisherTest() is the pathway enrichment method. This returns a list object where one entry is labeled 'fishresult' which is the dataframe of enrichment results.

filterFishersResults() is an optional filter to apply to the returned result to trim the result by an adjusted p-value. The pval_type argument is 'fdr' by default, which is a Benjamini-Hochberg corrected p-value, and estimated false discovery rate.

This method will cluster pathways to reduce redundancy. All pathways will be returned, but a pathway index column is added to indicate that two pathways are nearly the same, within the overlap level which is a jaccard similarity. Perc_pathway_overlap closer to 1.0 means that the two pathways much approach perfect overlap.

findCluster( fishers_df, perc_analyte_overlap = 0.5, min_pathway_tocluster = 2, perc_pathway_overlap = 0.5 )

Note that clustering can be quite slow if you have a very large input and number of clusters. Filtering to significant results is a good first step before clustering.

Best, John

From: jnstyles5 @.> Sent: Monday, September 25, 2023 5:05 PM To: ncats/RaMP-DB @.> Cc: Braisted, John (NIH/NCATS) [C] @.>; Comment @.> Subject: [EXTERNAL] Re: [ncats/RaMP-DB] Biological Pathway Enrichment Issues with Web GUI (Issue #58)

Hello John,

Thank you so much! I have familiarity with R and was able to install the package which ultimately worked great!

I did have some issues with the install, specifically SQL, that I wanted to make you aware of that might be helpful for others. I used these instructions: https://github.com/ncats/RaMP-DB. I installed mysql 5.7 because I think that was the version used to generate the dataset, but it might have been the client software version and I just misunderstood. I had no prior familiarity with SQL so that could have contributed to my issues. The main change that I figured out was using [mysql> source /your/file/path/here/ramp_.sql] instead of [> mysql -u root -p ramp < /your/file/path/here/ramp_.sql] to load the file, the later produced an SQL syntax error (ERROR 1064(42000)). I also noticed there were 17 files instead of the 12 files listed in the instructions. Overall the instructions were actually really clear and easy to follow, with the exception of this step, which didn't work for me.

I also wanted to double check that runCombinedFisherTest() is the correct function to perform the biological pathway enrichment like we would in the web version? The results seem correct, but the documentation was a little unclear, especially compared to the chemicalClassEnrichment() documentation and function naming.

Thank you for your help!

- Reply to this email directly, view it on GitHubhttps://github.com/ncats/RaMP-DB/issues/58#issuecomment-1734460811, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AANSEEAI2CLUEZRN4SKDQNDX4HWXJANCNFSM6AAAAAA4N2C5A4. You are receiving this because you commented.Message ID: @.**@.>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.