pimcore / data-importer

This extension adds a comprehensive import functionality to Pimcore Datahub.
Other
38 stars 56 forks source link

[Bug]: Cannot import CSV file with specified text. #371

Closed dawounit closed 10 months ago

dawounit commented 10 months ago

Expected behavior

The CSV file will be imported properly and a preview will be generated.

Actual behavior

It shows the message "Uploaded no valid preview file." instead of preview and the error "Uploaded file not valid, not creating any queue items and doing any cleanup." occurs when trying to execute import.

Steps to reproduce

Create import configuration. Select CSV as file format. Try to load file with below contents:

columnA;columnB
Some value;<a href=
kingjia90 commented 10 months ago

Confirming the problem, it looks like finfo is confused when detecting html code https://github.com/pimcore/data-importer/blob/43fb4128dac8284fcfc29b7ec6d2ffed99c4d22b/src/DataSource/Interpreter/CsvFileInterpreter.php#L77

Thinking of a solution for this but might be a third-partyish issue

See also https://core.trac.wordpress.org/ticket/47448

confirming also that problem persist when quoting the code and set the enclosure to " or ', and is not related by the fact that the last cell is an html code (can be anywhere)

dawounit commented 10 months ago

I think the real problem is the extension of the downloaded file. If I download the "data.csv" file, its name changes to "upload.import" after downloading on the server. This is misleading for "finfo_file" because there is no metadata in the csv file and you can only rely on the contents and extension of the file. The extension ("import") cannot match any mime type and the content contains html, so this fits best. Leaving the file extension "csv" solves this problem without having to expand the list of accepted mime types.

kingjia90 commented 10 months ago

it's a good input but i've tried to change to not change the file extension from .csv and the finfo_file still fails

More details finfo uses unix file command, so it's at even lower level. https://stackoverflow.com/a/45964722 image

On the opposite end, changing to symfony mime type guesser also doesn't help since image since is based on finfo as well https://symfony.com/doc/current/components/mime.html#guessing-the-mime-type:~:text=one%20of%20them%20based%20on%20the%20PHP%20fileinfo%20extension.

dvesh3 commented 10 months ago

Fixed by #374