jekyll / jekyll-import

:inbox_tray: The "jekyll import" command for importing from various blogs to Jekyll format.
https://import.jekyllrb.com
MIT License
512 stars 315 forks source link

Re-implement Dotclear importer #512

Closed ashmaroli closed 1 year ago

ashmaroli commented 1 year ago

Re-implement Dotclear importer based on export file provided by @jrfern in https://github.com/jekyll/jekyll-import/issues/510#issuecomment-1453747018.

This drops dependency on activesupport, includes associated tests and adds provided export file for future development.

Closes #510

ashmaroli commented 1 year ago

Hello @jrfern, The Dotclear importer has been rewritten based the export file you had provided. I would now like to know the directory structure of the "media folder" (media.zip unpacked) so as to implement the functionality behind --mediafolder and maintain backwards-compatibility.

If you wish to try this out, you may edit your Gemfile as follows:

# Gemfile

gem "jekyll"
gem "jekyll-import", github: "jekyll/jekyll-import", ref: "refs/pull/512/head"

(There is no need to include activesupport or any of the previous dependency gems).


TODO:

jrfern commented 1 year ago

Great! Thank you very much. Now I get

invalid option: --mediafolder (OptionParser::InvalidOption)

When run without this option (and after following your instructions)

$ bundle exec jekyll import dotclear --datafile path_to_backup.txt
jekyll 4.3.2 | Error:  Illegal quoting in line 1.
/usr/lib/ruby/3.1.0/csv/parser.rb:955:in `parse_quotable_robust': Illegal quoting in line 1. (CSV::MalformedCSVError)

I would now like to know the directory structure of the "media folder" (media.zip unpacked)

Inside the zip archive there's a "img" directory with the image files and subdirectories.

ashmaroli commented 1 year ago

@jrfern The CSV::MalformedCSVError is a bug that needs to be fixed. I would like to take a look at the actual backup file you used to test this branch. You may email the file to me directly instead of exposing it here. (email address is attached to all of my commits on GitHub)

I would now like to know the directory structure of the "media folder" (media.zip unpacked)

Inside the zip archive there's a "img" directory with the image files and subdirectories.

In the backup file you provided previously, the value to key media.media_file is "MiUser/250px-MonaLisaGraffiti.JPG". So, is the "img" dir parent directory to "MiUser`?

jrfern commented 1 year ago

@jrfern The CSV::MalformedCSVError is a bug that needs to be fixed.

Yes, please, @ashmaroli

I would like to take a look at the actual backup file you used to test this branch. You may email the file to me directly instead of exposing it here. (email address is attached to all of my commits on GitHub)

ashmaroli at users.noreply.github.com? Impossible. I'm feeling silly, but I haven't been able to find your email, just your jekyll-talk, github, reddit, linkedin accounts... Mine is jrfern at gmail...

I would now like to know the directory structure of the "media folder" (media.zip unpacked) In the backup file you provided previously, the value to key media.media_file is "MiUser/250px-MonaLisaGraffiti.JPG". So, is the "img" dir parent directory to "MiUser`?

I don't understand, the MiUser phrase was a reference to the path. I unzipped the media.zip file, and it created media/img/image_files. Then run the command with --mediafolder path/media/img/ (as it never worked I don't know if it should be simply --mediafolder path/media/ ).

Hope this helps. One more thing, for my tests your suggested

gem "jekyll-import", github: "jekyll/jekyll-import", ref: "refs/pull/512/head

Should I change that now that the PR has been approved?

ashmaroli commented 1 year ago

I'm feeling silly, but I haven't been able to find your email..

Ah! I should have just mentioned it right away instead.. it's ashmaroli at gmail..

run the command with --mediafolder path/media/img/ (as it never worked I don't know if it should be simply --mediafolder path/media/ )...

The original implementation (in existing releases) was to expect just path/media/. The importer would then copy the contents into destination path assets/images/. For example, say I provide --mediafolder media. Then the importer would look for media/MiUser/250px-MonaLisaGraffiti.JPG and if found, copy to assets/images/MiUser/250-px....JPG. The proposed implementation in this branch hasn't actually exposed the --mediafolder yet. (So, it will always fail if you try). But it will eventually have similar behavior to maintain backwards-compatibility.

Should I change that now that the PR has been approved?

The reference is permanent. It would be valid even if the pull request branch gets deleted after the pull request is merged. However, since the pull request is still a work-in-progress, you may have to run bundle update jekyll-import to get the latest state of this branch. (You don't have to update until I ask you for feedback.)

jrfern commented 1 year ago

@ashmaroli Real backup file sent privately. I'm learning so much - thank you again.

ashmaroli commented 1 year ago

Thanks @jrfern Received the backup file. Will use it to make changes to this branch.

ashmaroli commented 1 year ago

Hello @jrfern You may update your bundle reference to this branch by running bundle update jekyll-import to test at your end. I have also updated the importer documentation for better understanding. You may preview the document here.

jrfern commented 1 year ago

Recuperated 60 entries into _drafts and their images! Great! I'm fighting at the moment with the paginate-v2 plugin and so can't check but I would say that the import worked.

Thank you again, @ashmaroli

ashmaroli commented 1 year ago

Happy to hear that, @jrfern. Good luck tackling the pagination plugin 🙂 Thank you for testing and giving feedback.

jrfern commented 1 year ago

First analysis of the new plugin. I moved the older post ('Informe K-12 Open Minds Conference 2007 - parte I: Europeos') to the posts directory.

Works quite well, not totally well.

The excerpt part is missing (it's in the backup). For example:

"Informe K-12 Open Minds Conference 2007 - parte I: Europeos","<blockquote>\r\n<p><em>I was invited to&nbsp; attend the Conference held in Indianapolis. It was the start of something, I have to say. This is part one of my report in Spanish.</em></p>\r\n\r\n<p>La ventaja de dar tiempo a las cosas para ...
...
... y perfilar matices.</p>\r\n</blockquote>\r\n\r\n<p> </p>","<p style=\"text-align: justify;\">Escribo un informe sobre la K-12 Open Minds Conference....

This is converted into

<p style="text-align: justify;">Escribo un informe sobre la K-12 Open Minds Conference. Si eres impaciente puedes leer ya mucha información sobre lo que allí se habló en el <a href="http://k12openminds.wikispaces.com/" hreflang="es">K-12 Open Minds Conference Resource Site</a>.</p>

The blockquote (the whole header) is missing in the import.

ERROR `/assets/dotclear/img/.dia_1_m.jpg' not found.

In the backup

<p style=\"text-align: justify;\"><a class=\"media-link\" href=\"/dotclear/public/img/dia_1.jpg\"><img alt=\"\" class=\"media\" src=\"/dotclear/public/img/.dia_1_m.jpg\" style=\"float: left; margin: 0 1em 1em 0;\" /></a>

Now it is

<p style="text-align: justify;"><a class="media-link" href="/assets/dotclear/img/dia_1.jpg"><img alt="" class="media" src="/assets/dotclear/img/.dia_1_m.jpg" style="float: left; margin: 0 1em 1em 0;" /></a>

The images are treated as links. That was OK in the sense that there used to be two versions of each image, and the small one is a link to the big one, but there are no names starting with a dot in assets/dotclear and the link shoud be turned into an >img> tag.

So we miss the introductions to the entries and the images are treated as links. Can any of these points be fixed programmatically?

ashmaroli commented 1 year ago

@jrfern Added support for importing excerpts. While I had seen the post_excerpt field earlier, I did not realise that post_content doesn't start with the excerpt. Jekyll-generated HTML generally has excerpt as the first paragraph of the contents. (The exception being when user had supplied a custom excerpt string to Jekyll during the build process).

ERROR /assets/dotclear/img/.dia_1_m.jpg not found.. but there are no names starting with a dot in assets/dotclear..

These files do not have separate identity in the media table in the export file. So they won't be imported / mentioned in the log.

the link shoud be turned into an >img> tag.

They're already valid img tags. You don't see it or a placeholder holder for missing image because of CSS.

jrfern commented 1 year ago

Great! The excerpt was the only problem with the import, the issue with the images was a problem with the backup, not the import.

From my side the new code works and I have recuperated the posts from this old blog.

ashmaroli commented 1 year ago

@jekyllbot: merge +minor

ashmaroli commented 1 year ago

@jekyllbot: merge +minor

ashmaroli commented 1 year ago

@jekyllbot: merge +minor