Closed FeLoe closed 6 years ago
Now all the importers should work for files as well as folders..
Perfect, thanks! Maybe @mariekevh can test it on Wednesday?
Yes, I will look at it this Wednesday :)
I checked again everything and had a couple of things that caused some problems:
by default the HTML source is exported and in that case the file cannot be imported again (field too large) -> should we make this an explicit option (such as include_html) as I don't assume many people would actually want to export this to csv? This would prevent issues with re-importing the data (otherwise one always has to explicitly define all the fields excluding HTML if the file should be re-importable again)
If I export multiple files I only have headers for the first one. For every file I export after that I do not have headers (thus cannot import it again) I guess it has something to do with this:
if self.fileobj and not self.fileobj.closed:
outputfile = self.fileobj
new = False
elif self.fileobj:
outputfile = self._makefile(destination, mode='a')
new = False
If this is desired behaviour it should be noted somewhere, but I think it is a little strange that one can only export one file with headers..
But: If I only select the non-HTML fields and have headers the import function is working just fine for me 😉
Thanks, @FeLoe . I think HTML export should be optional, by far most users won't need/want it (those who do probably export to JSON anyway).
Regarding the headers. I was not aware of it, but I actually find it desirable as it is, because it allows concatening files without having headers in between:
cat output1.csv output2.csv > everthinginonelargefile.csv
or
cat output*.csv >everythinginonelargefile.csv
But you are right that this should be noted somewhere and/or be optional.
Fixed the last issues with the exporter (it now is not exporting images and only exporting HTML if necessary). Now the exported documents can be imported again. + The telegraaf scraper is now also fixed (had some issues with the titles..)
Maybe @mariekevh can have a final look and check? Then I'll merge (and resolve the conflicts that seem to be there)
@damian0604 Actually sitting next to Felicia :)
Works fine! @damian0604 The conflict arises because while Felicia was working on this, I solved the no headers issue in export_csv in PR #391
Importer function should now work without supplying a doctype