Closed Abhishaike closed 10 months ago
Hi Abhishaike, thank you for reaching out! This is kind of what gget.setup('elm')
does. This command saves all ELMs in tsv files in a folder called 'elm_files' inside the gget installation folder by default. I just added a new out
argument to gget.setup
, which allows the user to specify an alternative download folder. This option will be part of the next gget release (v0.28.3), but you can already install it from the dev branch and try it out:
## Install gget from the dev branch
#!pip install -q mysql-connector-python==8.0.29
#!pip install -q git+https://github.com/pachterlab/gget.git@dev
!pip install -q gget
import gget
# Save all ELMs in the current directory
gget.setup("elm", out="./")
# Open ELM files using pandas
import pandas as pd
# Load all ELM instances
df_instances = pd.read_csv("elm_instances.tsv", sep="\t", skiprows=5)
# Load additional information about ELMs (description, functional site, etc.)
df_classes = pd.read_csv("elms_classes.tsv", sep="\t", skiprows=5)
# Load additional information about interaction domains
df_intdomains = pd.read_csv("elm_interaction_domains.tsv", sep="\t")
# Rename columns in interaction domains file to match other files
df_intdomains = df_intdomains.rename(
columns={
"ELM identifier": "ELMIdentifier",
"Interaction Domain Id": "InteractionDomainId",
"Interaction Domain Description": "InteractionDomainDescription",
"Interaction Domain Name": "InteractionDomainName"
}
)
# Merge information about all ELMs into a single data frame
df_elm = df_instances.merge(df_classes, how="left", on="ELMIdentifier")
df_elm = df_elm.merge(df_intdomains, how="left", on="ELMIdentifier")
df_elm
Please note that the dev branch is currently undergoing active development, and there might be breaking changes. Does this solve your request?
Edit: I also added information about interaction domains. Edit: v0.28.3 will be released today, so moving forward, there is no need to install gget from the dev branch for this.
I'm going to go ahead and close this issue, but please let me know if the proposed solution does not work.
Just saw this, thank you!
I am noticing that a lot of the interaction domain columns are missing, df_intdomains only contains 4 columns...is there a way to add in the affinity + start/stop ELM bits?
Poking at this again!
Request type
Extension of existing module
Request description
Can I get all ELM's using just gget?
Example command
Example return value
regex_df would contain all ELM's