greenelab / connectivity-search-manuscript

Manuscript describing Hetnet Connectivity Search
https://greenelab.github.io/connectivity-search-manuscript/
Other
5 stars 10 forks source link

Convert vector figures to EPS files for better quality in Gigascience PDF #53

Closed dhimmel closed 1 year ago

dhimmel commented 1 year ago

refs https://github.com/greenelab/connectivity-search-manuscript/issues/52

It appears that Gigascience uses JPG for all images in the web version, but can use EPS to include vector images in the PDF. Therefore in pursuit of a pretty PDF, we should submit EPS files for all figures where the source exists in a vector format.

dhimmel commented 1 year ago

Email

Greetings PXE5 Support,

We would like to provide vector images for select figures. I tried uploading them in PXE5 via replace image and then uploading the file under High Resolution.

However, when I regenerate the PDF, it looks as if the JPG image sources are still being used. Would you be able to upload these EPS images to be used for the respective figures? Let us know if there are any issues with how they render.

Best, Dr. Himmelstein

Attached giad047_f4.eps, giad047_f5.eps, giad047_f6.eps from c560f51adfb1a3113b7cb1419a294cb80be2749a.

Reply

Dear Dr. Himmelstein,

Thanks for your email.

We have forwarded your request to the concern team and will informed you once it is done.

Regards, Sunil Singh Technical Support | PXE

Reply

Thank you Sunil. Much appreciated.

Here is one additional EPS file to upload for Figure 1.

Best, Dr. Himmelstein

Attached giad047_f1.eps from https://github.com/greenelab/connectivity-search-manuscript/pull/54/commits/1d7565da7501c1cd6591a7268224fbb9e73e209d

Reply

Dear Dr. Himmelstein,

Thanks for your patience and cooperation with us.

We have updated the provided revised figures 4, 5, 6 in PXE as per the below mail.

However as per our art team concern, figure 5 is still received in raster format rather than vector image format.

So, please check the latest PDF (attached) and provide the high resolution image for figure 5 in case existing figure 5 still needs to be replaced as per the requirement.

Please do let us know in case of any further concern in this regard.

Thanks & Regards, Sunil Kumar

Attached: 2023-06-26_giad047-eps-figures-except-5.pdf

dhimmel commented 1 year ago

However as per our art team concern, figure 5 is still received in raster format rather than vector image format.

I think the issue with figure 5 is the legend transparency, which is added by default in matplotlib in plot-null-dwpc-distributions.ipynb. Transparency is not supported by EPS and therefore pdftops decided to rasterize.

The framealphafloat or shadow argument of matplotlib.pyplot.legend would be able to disable this transparency.

dhimmel commented 1 year ago

pdftops -rasterize never in https://github.com/greenelab/connectivity-search-manuscript/commit/0b6fc8bafb8e4b2213ac63c91d215c2f4dd4a9ff was able to force figure 5 to remain as vector.

dhimmel commented 1 year ago

I've reopened this issue to see if we can create EPS vector images for Figure 2 and 3 (source directory), which are screenshots of the webapp that are currently PNG. This is motivated by the PXE5 publication system being unable to include the raster image at suitable resolution to be readable, see https://github.com/greenelab/connectivity-search-manuscript/issues/52#issuecomment-1602870936.

I tried https://www.vectorizer.io/ and https://vectorizer.ai/, which both showed promise converting the raster to vector, but were not perfect (did not understand that letters should have a single and sharp boundary.

I see Adobe Illustrator has an Image Trace tool, perhaps this would be better? @vincerubinetti do you have any idea whether the raster to vector conversion would be possible and high fidelity using Illustrator?

vincerubinetti commented 1 year ago

Yeah after i made that comment i tried my "vector magic" program and it didn't do a good job with the small letters in circles.

Unfortunately i think we'll just have to retake the screenshots. Doing this on a high dpi display and using Firefox's right click take screen shot feature might help. I'll try to do this later today.

dhimmel commented 1 year ago

Unfortunately i think we'll just have to retake the screenshots. Doing this on a high dpi display and using Firefox's right click take screen shot feature might help. I'll try to do this later today.

Are you saying we could take the screen shots as vectors? I don't think higher resolution rasters will help, since the issue is that the PXE5 publication system is downsampling the rasters. I have reached out to the editor in chief at Gigascience in hopes of being connected with someone who has the technical levers to improve the image resolution in the PXE5 system.

vincerubinetti commented 1 year ago

I'm talking about higher res raster screenshots, yes. There is no easy or straightforward way to convert a website's DOM into a vector format. Two ways I can think of: 1) make a very high res screenshot, then use a raster-to-vector converter. 2) print the website page as a PDF and import it in illustrator (would require a lot of manual clean up but might end up taking less time than continuing back-and-forth with the publisher). Not sure which of the two will be less of a pain.

I read #52 but am a little confused by the back and forth. Looking at the rasters we provided, Fig. 3D is high res enough to be legible, yet the screenshot where they said they've "processed with the highest quality" is way lower resolution and clarity. I think there is something degraded about the software they are using or how they are using it. I see what you mean by "making it higher res might not solve the problem". Perhaps simply doing the "conversion" into EPS (even though simply importing raster pixels into a vector format doesn't just magically make it vector data) would solve the problem by playing nicer with the software they're using.

vincerubinetti commented 1 year ago

See below:

editing steps http://localhost:3000/?source=17287&target=7607&metapaths=DaGiGpPW%2CDdGiGpPW%2CDdGpPW%2CDlAeGpPW%2CDrDaGpPW%2CDrDuGpPW%2CDuGiGpPW&complete= - run app locally and edit such that dropdowns stay open always after opening (otherwise when printing they get hidden) - remove `filter` css styles as needed so svg icons and such don't get printed as raster images - remove screen only from css media query for expanding table - delete all but section of interest from the dom in the dev tools. makes removing extraneous shapes easier, and ensures section doesn't end up across page splits. - print to pdf with landscape and no margins, and import into illustrator - use "direct select tool" to highlight just items of interest, and cut - delete all other items and paste cut items, and select all - carefully delete any elements hidden behind other elements (eg node info section) - artboard tool, presets, "fit to selected art" - save as svg, with "font" = "covert to outline" (if you want more robust font/text handling but bigger file size) and "image location" = "link". if any other images other than the main svg get saved, that means there are raster images being used in the svg, and something is wrong. - remove all root element attrs except xmlns and viewBox, remove all clip-path attrs on any element, remove all defs - along the way, pass svg source code through svgomg and prettier, and have a live preview open (vs code) to make sure all the image contents are still there - if needed, add a rect with the same x/y/width/height as the viewbox for a background fill
dhimmel commented 1 year ago

Wow! Incredible. Quite a few steps to get that to work. Are you able to arrange these into the multipanel figure?

Does the print to PDF fail with the wider expanded metapath table / Figure 2?

This conversion is heroic.

vincerubinetti commented 1 year ago

Here are all the files. Doing the expanded took more css and print finagling. I included the multi-figure as an illustrator and eps file.

figure.zip

Please compare them carefully to the existing to see that they're the same or adequate.

dhimmel commented 1 year ago

Please compare them carefully to the existing to see that they're the same or adequate.

Brilliant! Comparing the figures to the existing figures:

I'll go ahead and upload these! Thanks again for this quite challenging conversion.

dhimmel commented 1 year ago

@vincerubinetti any idea why figure.eps is so large (10MB)? Perhaps the production PDF process will reduce the size? Also if possible, can we export an SVG and PDF from illustrator for the figure? Would love to have these on hand in case we ever need them for future.

vincerubinetti commented 1 year ago
  • Panel A with node selection has different nodes that appear up in the suggested results due to a possible non deterministic tie breaker. The new list is not a problem!

Note this was just me scrolling down 1 entry more in the list.

I'm going to fix the other points (ignore my previous comments here).

vincerubinetti commented 1 year ago

figure.zip

Okay here are the fixed files.

I've included the intermediate steps, including the print to pdf step (with a cleaner procedure). I've exported eps, pdf, and svg for the combined figure and expanded metapaths panel. For eps and pdf, I leave the text as fonts, which saves a little size. For svg, I convert the text to outlines so it can be properly rendered in anything, including as a figure in the html manuscript. I believe this is the smallest each file can get. I also fixed the expanded panel.

Please check closely again.

dhimmel commented 1 year ago

I uploaded the figures from the proceeding comment in 918f6af1c06d5dc22a5502997955b48e44e0c2b8. Overall looks great and so close!

I ended up regenerating the EPS files from the PDF using the pdftops which gave smaller file sizes and helped the expanded metapaths table render on my linux machine. Since pdftops worked for other EPS images that got inserted into the Gigascience PDF, probably makes sense to use it here.

@vincerubinetti I noticed some selection changes in the new figures. In panel B, both expanded and collapsed, we have not disabled precomputed only. The metapath selection is not identical as before. Same with panel C with path selection. I wonder if the new print/export process ended up toggling all metapath/path checkboxes to selected?

vincerubinetti commented 1 year ago

Using pdftops sounds good.

Are you sure about the discrepancy? I'm looking at the preceding zip upload and comparing it to the current figures and I'm seeing the same thing. And the "precomputed only" is unchecked (gray check icon). I'm seeing the same path count and adjusted p between B and B expanded. Granted, as I'm checking this, I'm only checking the pdfs and svgs, not the eps.

My goal was to match the existing as close as possible. To do so, in the metapaths, I also had to show all of them, click all the ?s to compute the values to fill all of them in and sort properly, then revert back to showing just 10. Similarly in paths, I had to show all to select one much further down the list that ends up including a green anatomy node in the graph.

dhimmel commented 1 year ago

Are you sure about the discrepancy?

Ah you are right! The PDF source is good, the pdftops command introduced the error, probably because the checkboxes have transparency which did not convert to EPS. So we could try to submit the EPS files you provide and see if they get compressed in the output PDF. I just need to confirm b.metapaths-expanded.eps is healthy, since it gets stuck opening on my viewer and the content is all binary (which might not be a problem). UPDATE: I can open b.metapaths-expanded.eps in Inkscape and it looks good, so I will submit the two EPS files and we'll see what the final PDF size becomes.

dhimmel commented 1 year ago

EPS sourced figures in the published PDF look great. Thanks again @vincerubinetti for this heroic and time-sensitive effort to preserve the beauty of our figures!