google / patents-public-data

Patent analysis using the Google Patents Public Datasets on BigQuery
https://bigquery.cloud.google.com/dataset/patents-public-data:patents
Apache License 2.0
539 stars 163 forks source link

claim_text_extraction.ipynb df = pd.read_csv('./data/20k_G_and_H_publication_numbers.csv') workaround #70

Open austinjhicks opened 2 years ago

austinjhicks commented 2 years ago

I had a hard time getting this bit of code to work:

df = pd.read_csv('./data/20k_G_and_H_publication_numbers.csv')

I went into jupyter labs, copied the file path of the 20k_G_and_H_publication_numbers.csv, and then pasted the file path. For me, the command looked like this:

df=pd.read_csv('GCS/20k_G_and_H_publication_numbers.csv')

I don't know that I'm doing that ^^ right, but it didn't work to load the dataframe. I found this google-cloud-service-buckets workaround on stack overflow:

Store the file in a GCS bucket.

1.Upload your file to GCS.

2.In your Notebook, type the following code, replacing the bucket and file names accordingly:

import pandas as pd from google.cloud import storage from io import BytesIO client = storage.Client() bucket_name = "your-bucket" file_name = "your_file.csv" bucket = client.get_bucket(bucket_name) blob = bucket.get_blob(file_name) content = blob.download_as_string() df = pd.read_csv(BytesIO(content)) print(df)

Credit to OP

^^ this workaround worked for me