hannesdatta / covid-19-book-consumption

MSc Thesis project examines the influence of COVID-19 on book reading behaviour
0 stars 2 forks source link

Test workflow #1

Closed hannesdatta closed 2 years ago

hannesdatta commented 2 years ago

Please test-run the workflow, report any occurring issues, and come up with a list of suggestions on how to improve the workflow.

Minimum requirements are:

srosh2000 commented 2 years ago

Hi @hannesdatta ,

I am currently stuck unable to read the large all_books.csv file of almost 4gb. Error: Limited virtual memory image

I am currently working on the t2.micro ec-2 instance, I am wondering if the solution to this issue would be to upgrade to another instance type with higher RAM?

Found this thread which deals with this issue but the solution is still not too clear to me: https://github.com/Rdatatable/data.table/issues/3526

hannesdatta commented 2 years ago

Totally, this is a memory issue caused by ur small instance. Try one of the medium or large ones and pause it when u don't work on it. Save the bills from AWS so I can reimburse you eventually. Keep a max cloud budget for this month, say 100 eur, and be in touch should it not suffice. Ok?

Further, you can test the workflow on small datasets first and then only take it to the big machine. Make a sensible decision here.

Thanks for your work!

srosh2000 commented 2 years ago

I just finished testing the workflow entirely. After some trials with different types of instances, I used the t2.xlarge instance for smoother and faster workflow. I have versioned minor changes I made in the repo I forked: https://github.com/srosh2000/covid-19-book-consumption.git.

To quickly spell out the minor changes made/suggestions for improvement per src file:

hannesdatta commented 2 years ago

Discussed and workflow has been tested by @srosh2000. Thanks!