Issue with loading the parquet version of the Hansard data

stephbuon / digital-history

Instructional repository for "Text Mining as Historical Method"

GNU General Public License v3.0

7 stars 3 forks source link

Issue with loading the parquet version of the Hansard data #47

Closed stephbuon closed 3 years ago

stephbuon commented 3 years ago

Oddly, some students are unable to load the parquet version of the Hansard data.

When using:

import pandas as pd

hansard = pd.read_parquet("/scratch/group/oit_research_data/hansard/hansard_20191119.parquet")

hansard.head(5)

They get the error: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.

I confirmed these students are in the correct environment and used the correct directory paths.

rkalescky commented 3 years ago

The most likely situation is that students are not correctly pulling, or not doing so at all, the latest changes from the repo. I encountered this issue numerous times with students I helped during the previous class.

We can leave this issue open for a while until the few students noted in Slack have ok'd their situations.