jbsparrow / CyberDropDownloader

Bulk Gallery Downloader for Cyberdrop.me and Other Sites
GNU General Public License v3.0
187 stars 14 forks source link

[BUG] CSV formatting issues #251

Closed baccccccc closed 1 day ago

baccccccc commented 5 days ago

Hello! First of all, thanks for following up on #210 and converting logs to CSV!

I found a few irregularities. Maybe there are more, I have not looked very thoughtfully yet.

  1. Unsupported_URLs.csv doesn't have headers.
  2. All files seem to add a space between the previous value and a comma.
  3. There's no line break after the headers and the first row.
NTFSvolume commented 5 days ago

Can not reproduce issue 1 and 3, may be OS specific. What operating system are you running CDL on?

Issue 2 is intentional. The first column is the URL and without the padding, every editor (VScode, notepad, micro, etc.) parses the next word as part of the URL, not just the first column. If you try to click on it, it will open an invalid page, probably a 404

Ex: https://github.com/jbsparrow/CyberDropDownloader,CSV formatting issues,251 is a valid CSV row of 3 columns, no restricted characters. But editors will parse the URL as https://github.com/jbsparrow/CyberDropDownloader,CSV

Other workarounds could be:

  1. Padding only the URL
  2. Add quotes to the URL.

Padding only the URL will make the formatting inconsistent. Quoting would require quoting every value of every column. Either one could be implemented but it narrows down to being a personal preference.

baccccccc commented 4 days ago

yeah, I should have probably file three separate issues for that.

  1. This is definitely a thing, and I hardly realize how it can be platform-specific. However, I'm wrong that there are no headers whatsoever. Somehow, they happen in the middle of the file. E.g., in the 2nd row or below. Sometimes, there's even more than one header row in different places of the file. So, it might be something random and not always reproduceable. However, based on my limited testing so far, this only happens to Unsupported_URLs.csv and no other files. (The remaining two issues happen to all .csv files consistently.)
  2. Yeah, let's discuss various other options. I would assume that people who open CSV do not do that with a general-purpose text editor. Instead, they do this with some tool which is aware of CSV format. I personally tend to use PowerShell; some other folks might use Excel and such. In those cases, an extra space messes up with data. (I know I could truncate every value, but it just adds extra work.)
  3. This is actually my bad. Upon further digging, in fact, there are two line breaks between each line. Or, more precisely, a sequence of 0×0D, 0×0D, 0×0A, or CRCRLF. This is a bit weird, although apparently a known issue with some apps, based on this SO thread.

Now, the thing is that different Windows apps seem to handle this situation differently, which is not great. More specifically:

NTFSvolume commented 3 days ago
  1. This sounds like a race condition. Adding a lock to the file should prevent this by making sure things are written in the same order as they are queued.
  2. Will switch to quote for all column values. This will make it machine parsable and still fix the URL issue on text editors.
  3. We can remove the extra carriage return to fix this.

All of these are easy fixes. They will be implemented in the next version but it might take a few days to be released cause there is some other big stuff going on which also need to be addressed.

Thanks for the report