visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Other
1.58k stars 77 forks source link

[Bug]: Error in atrain_crops csv build #348

Open TamirBar-Tov opened 2 hours ago

TamirBar-Tov commented 2 hours ago

What happened?

Hi! I run the code on windows (pycharm) and the atrain_crops file doesn't build preperly. When I run the same code on colab it works well. It somewhre in the dll.do_main function.

filename: ???????? ???????? ??????????????? ???????? ???????? ???????? ???????? ???????? ??????????? ????????

What did you expect to see?

No response

What version of fastdup were you runnning on?

2.2

What version of Python were you running on?

Python 3.9

Operating System

Windows

Reproduction steps

No response

Relevant log output

index   filename    crop_filename   col_x   row_y   width   height  label   confidence
0   ????????    ??????????????????????????g460_64.dll   1525    806 1989    1096    ?g  0.76063
1   ????????    ??????????????????????????? 3506    1310    4053    1540    ?g  0.66642
2   ??????????????? ??????????????????????????g_1462.jpg    2606    58  3412    1520    ??e???r 0.67283
3   ????????    ??????????????????????????g2_168.jpg    2279    777 2621    945 ?g  0.35012
4   ????????    ??????????????????????????? 1276    1283    1560    1857    ?g  0.4677
5   ????????    ??????????????????????????? 34  3112    351 3227    ??????t 0.33369
6   ????????    ???????????????????????????g173.jpg 5022    1566    5670    2873    ?t  0.78459
7   ????????    ???????????????????????????g115.jpg 43  4979    561 5152    ??????? 0.32928
8   ??????????? ???????????????????????????g30_454.jpg  2628    1030    3658    1484    ?g  0.64583
9   ????????    ??????????????????????????g 2090    822 2382    1077    ?g  0.64642

Attach a screenshot [Optional]

No response

Contact Details [Optional]

bartov7@gmail.com

dbickson commented 2 hours ago

Hi @TamirBar-Tov this is related to locale encoding on windows. You need to compare the environment variables between jupyter and pycharm to see where they issue is coming from.

This is from ChatGPT:

Font issues in PyCharm versus Jupyter when using pandas.to_csv could be influenced by several environment variables or settings:

Locale Settings: Ensure that the locale in PyCharm is set correctly. Jupyter may be using UTF-8 encoding by default, while PyCharm could be defaulting to a different encoding (e.g., ANSI).

Check the PYTHONIOENCODING environment variable in PyCharm. Set it to utf-8 if not already set:

PYTHONIOENCODING=utf-8

Console Encoding: PyCharm's console may not handle special characters or encodings as well as Jupyter's interface. Check the console encoding settings in PyCharm:

Go to File > Settings > Editor > File Encodings and ensure UTF-8 is set for "Global Encoding" and "Project Encoding." Pandas Display Options: Pandas might display text differently based on the environment. Try enforcing the encoding when saving the CSV file:

python Copy code df.to_csv('file.csv', encoding='utf-8') Font in PyCharm: If PyCharm is using a font that doesn't support special characters, change the font under File > Settings > Editor > Font to something like Consolas or another monospaced font that supports Unicode.

System Locale: On Windows, the system locale may differ between applications. Ensure that the locale for non-Unicode programs is set to a UTF-8 compatible option.