Closed emikowaight closed 5 months ago
Error summary: UnsupportedFileSystemException: No FileSystem for scheme "gs"
This error means that your system does not recognize Google Cloud Storage system.
Based on your code, you were trying to extract genotype data from All of Us, but it seemed you were running PheTK outside of the All of Us researcher workbench. To use any All of Us related features, user must be registered as a researcher with the All of Us Research Program, and run PheTK within their workbench. More information can be found here: https://www.researchallofus.org/
I was actually running the code within the All of Us researcher workbench, so that is why I was confused as to why it was not working.
<img width="1323" alt="Screenshot 2024-05-05 at 6 11 25 PM" src="https://github.com/nhgritctran/PheTK/assets/162065390/b7fc3b44-f6de-435a-b312-b01707c34827
">
Thank you for the clarification. That is a bit weird because the initial No FileSystem for scheme "gs"
error should not happen running on All of Us because All of Us uses Google Cloud buckets for project long term storage as well as major resources such as variant data - that is why I guessed you were running PheTK outside of it.
The provided error screenshot was truncated so I cannot tell what the exact error was, but it did appear to be related to mt_path
. A possibility is you are running an older version of PheTK (version 0.1.37 or earlier; current version is 0.1.39) where the default matrix table path was not updated.
Could you please lease provide more details on:
!pip show PheTK | grep Version
in a notebook cell. I just tested PheTK v0.1.39 on a dataproc VM on All of Us and successfully generated a cohort with your settings above.
The version is listed as 0.1.38, how do I upgrade to the 39 version? Is that what is causing the error?
<img width="1045" alt="Screenshot 2024-05-05 at 7 05 27 PM" src="https://github.com/nhgritctran/PheTK/assets/162065390/080d34bc-4572-4e72-a903-00ee83a053d4 <img width="1340" alt="Screenshot 2024-05-05 at 7 05 39 PM"
src="https://github.com/nhgritctran/PheTK/assets/162065390/34f14a39-6908-4808-abce-acc96e21889f"> ">
I added three screenshots because the error was so long I could not fit it into 1
0.1.38 should have updated mt_path already. Version is not the case here, but you can update PheTK following instruction on this GitHub by running !pip uninstall PheTK -y && pip install PheTK
in your notebook and then restarting kernel.
What is your cloud environment configuration? i.e., standard VM or dataproc? You can check by clicking on the Jupyter icon on the vertical panel on the right side.
You are probably running .by_genotype()
method on a standard VM instead of dataproc VM. This method uses hail to interact with the All of Us hail matrix tables containing variant data. Running hail on standard VM can get that "gs" error, too.
If you are new to cloud environment on All of Us, this link contains information on customizing your Jupyter environment https://support.researchallofus.org/hc/en-us/articles/18278298730644-Using-customizing-and-optimizing-Jupyter-cloud-environments
This dataproc requirement is already mentioned on this GitHub readme:
Hello, I wanted to use this PheTK tool to conduct a PheWAS examining different variants in the ALDH2 gene. I tried to replace the information in the demo with the correct genotype I am looking for, but I get a fatal error when I try to run the program. From the error message it looks like there is an issue in the mt_path read matrix, and I am unsure is something was changed in the default path on all of us? I have copied the code I ran and the initial output error message.
The last part of the error message: Hail version: 0.2.126-ee77707f4fab Error summary: UnsupportedFileSystemException: No FileSystem for scheme "gs"
FatalError Traceback (most recent call last) Cell In[3], line 7 4 cohort = Cohort(platform="aou", aou_db_version=7) 6 # generate cohort by genotype ----> 7 cohort.by_genotype( 8 chromosome_number=12, 9 genomic_position=111803962, 10 ref_allele="G", 11 alt_allele="A", 12 case_gt="0/1", 13 control_gt="0/0", 14 reference_genome="GRCh38", 15 mt_path=None, 16 output_file_name="aldh2_test1_cohort.csv" 17 )
File ~/.local/lib/python3.10/site-packages/PheTK/Cohort.py:136, in Cohort.by_genotype(self, chromosome_number, genomic_position, ref_allele, alt_allele, case_gt, control_gt, reference_genome, mt_path, output_file_name) 133 variant = hl.parse_variant(variant_string, reference_genome=reference_genome) 135 # load and filter matrix table --> 136 mt = hl.read_matrix_table(mt_path) 137 mt = mt.filter_rows(mt.locus == hl.Locus.parse(locus)) 138 if mt.count_rows() == 0: