jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
58 stars 29 forks source link

[Bug]: JASP imports csv data from large files extremely slowly #3015

Open sander-kytisaar opened 1 day ago

sander-kytisaar commented 1 day ago

JASP Version

0.19.1

Commit ID

No response

JASP Module

Unrelated

What analysis are you seeing the problem on?

No response

What OS are you seeing the problem on?

Windows 11, Flatpak

Bug Description

When importing data from .csv files around 15 MB in size, JASP takes 3-4 minutes to import the data (from half-filled progress bar to data tables visible on the screen).

When file sizes grow to ~30 MB, the same process takes about 15 minutes and even longer with larger files. Computer specs don't seem to play a role, as we tested on PCs with 8 CPU cores and 32GB RAM and SSD's for storage, to much more capable VMs. Tested on 0.19.1 and 0.18.3, and found the same behavior. Used dummy csv datasets for testing from https://www.datablist.com/learn/csv/download-sample-csv-files

Expected Behaviour

Data is imported faster.

Steps to Reproduce

  1. Open Jasp
  2. Open a file via burger menu -> Open -> Computer -> Browse ... etc

Log (if any)

No log output

More Debug Information

No response

Final Checklist

github-actions[bot] commented 1 day ago

@sander-kytisaar, thanks for taking the time to create this issue. If possible (and applicable), please upload to the issue website (https://github.com/jasp-stats/jasp-issues/issues/3015, attaching to an email does not work) a screenshot showcasing the problem, and/or a compressed (zipped) .jasp file or the data file that causes the issue. If you would prefer not to make your data publicly available, you can send your file(s) directly to us, issues@jasp-stats.org

sander-kytisaar commented 1 day ago

Mostly replying to clear the "Waiting for requester" label as a screenshot would just show a half-filled progress bar (weirdly enough it progresses to the half-way point nearly instantly and then remains at a near standstill from then on). Apologies for the spam!

tomtomme commented 1 day ago

Thx for the Report. For upcomming 0.19.2 we made various improvements there. I will Test later with your data

tomtomme commented 20 hours ago

Testing with 0.19.2 beta

Is this fast enough for your purpose? Having multiple cores does not matter since file importing is single threaded. Maybe this could be multithreaded to further speed up the process... But other specs do matter a lot. Opening the 333 MB file eats up 15,4 GB of RAM for JASP alone. Also the OS and the SSD may have an impact. My system:

AMD Ryzen 7 5800X3D 8-Core Processor 32GB RAM Samsung PRO NVME SSD

-------- Application Info -------- JASP Version: JASP 0.19.2 Build Branch: HEAD Build Date: Nov 12 2024 15:25:21 (Netherlands) Last Commit: 5c1c6947435168a8bad73f9ae351c2539fb61504

-------- Basic Info -------- Operating System: KDE Flatpak runtime Product Version: 6.7 Kernel Type: linux Kernel Version: 6.6.54-2-MANJARO Architecture: x86_64 Install Path: /app/bin Platfotm Name: wayland System Local: de_DE

sander-kytisaar commented 2 hours ago

Thank you very much for the info, this looks to be a considerable improvement! I realize it may be bad form to ask, but do you have an estimate when 0.19.2 will be released?

tomtomme commented 2 hours ago

We are in the testing phase for 0.19.2. Some weeks I guess. Hard to say. You can check out preview versions: https://static.jasp-stats.org/Nightlies/

but those might not be fully stable.