mwilliamson / mammoth.js

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
5.03k stars 547 forks source link

Checkbox Form Field Not Supported #415

Open rosenbauerwillbacker opened 2 months ago

rosenbauerwillbacker commented 2 months ago

Thank you for open sourcing mammoth. I want to use Mammoth to digitalize docx files for hospitals.

My biggest issue however is that Form Field Checkboxes are ignored: image Our docx files have hundreds of checkboxes, so it would be awesome if I could get this working.

Form Field Checkbox: Form Checkbox.docx

The docx produces: <p> Laboratory professional</p><p> Other medical professionals</p>

Goal: I want the form checkbox to be exported like a normal checkbox. Default Checkbox.docx

This docx produces the right output: <p>☐ Checkbox not ticked</p><p>☒ Checkbox ticked</p>

End note: Form checkboxes have become a standard feature in Word in the business world. Thus, I think this would be a crucial feature to support.

mariatahler commented 2 months ago

Checkboxes are also very important for my use case. Could you perhaps look into this @mwilliamson?

jan-schweiger commented 2 months ago

I will also sponsor broad checkbox support with $150. @mwilliamson checkboxes are also really important for our upcoming production test. Do you think Mammoth could support it? Thank you in advance :)

mwilliamson commented 2 months ago

I haven't had a chance to look at how checkboxes are represented yet, but I think the main question I have right now is what the output should be. The suggestion so far is to represent them as Unicode symbols: is that what everyone would expect, or are there some cases where HTML checkboxes (i.e. <input type="checkbox" />) would be expected?

jan-schweiger commented 2 months ago

Thank you for your response @mwilliamson. I think both solutions (unicode characters or html checkbox-input) are fine. However, I think it would be great if we could have one common way checkboxes are represented in the output. So it should be only one of the two in my opinion.

I think Mammoth users might expect <input type="checkbox" /> and <input type="checkbox" checked />. But if that one is more complicated to implement, a unicode representation is perfectly fine as well.

Checkboxes really have become a common feature nowadays, so it would be great to have that in Mammoth.

rosenbauerwillbacker commented 2 months ago

@mwilliamson Thank you for helping us out here. I agree, both solutions would work fine. However, I think <input type="checkbox" /> should be preferred, because you could have the unicode character ☒ in you Word file as well.

Do you see a way how we can identify form checkboxes?

rosenbauerwillbacker commented 2 months ago

Any updates on this one? We are still looking for a solution.

mwilliamson commented 2 months ago

Nope, no update.

rosenbauerwillbacker commented 2 months ago

Thank you for the reply. Hopefully there will be a solution.

marciaterzo commented 2 months ago

I also encountered this bug. Checkboxes disappear and are not even present as unicode characters when importing my Word file. Is there anything we can do about it?

mwilliamson commented 2 months ago

Mammoth doesn't pay the bills, so I'm afraid I don't spend much time working on it these days.

marciaterzo commented 2 months ago

It's very sad to hear that. But I understand. Thank you for the fantastic work you've put into Mammoth over the years. You've certainly made a difference ❤️.

jan-schweiger commented 2 months ago

I think it would be very sad if Mammut is no longer supported and becomes stale.

My company is currently testing Mammoth unfortunately there are a few inconsistencies like this that we still need to fix. If we can do that, we can move it to production and I'm pretty sure I can get a sponsorship from my company for Mammoth.

Until then, I can offer to support Mammoth personally with $500/month. Unfortunately, that's all I can afford with my current salary.

Do you know others who use Mammoth in production @mwilliamson? Maybe you can also reach out to them, so that we can raise a small amount of funding for keeping Mammoth alive.

It would be really sad if Mammoth were to stop its support/development at all and becomes stale.

mwilliamson commented 1 month ago

Just to clarify: the status quo is that Mammoth is still maintained, that is, it receives bug fixes and occasional new features.

The main reason I work on the project is for fun, so I'm unlikely to start more actively trying to chase funding: at that point, it starts looking more like a job, and (a) I already have one of those, (b) the nature of providing an open source library is that, broadly speaking, I have no idea who's using it, and (c) I suspect that it wouldn't be a particularly well-paying job.

If I'm being entirely honest, another aspect is that a lot of requests come from commercial companies, and I'm increasingly unmotivated to implement new features for free or well below what I would be paid for commercial work when the beneficiary is a profit-making enterprise.

That might be too much information, or not the answer you were hoping to hear, but I hope it clarifies the situation!

rosenbauerwillbacker commented 1 month ago

Got it!

As I mentioned before, I am trying to use Mammoth to digitize the documentation in our hospital. Our hospital is under a lot of pressure and saving time is really important. I honestly think it's a compliment to your great work that Mammoth can be used in business / NGOs and actually make a difference.

I'm sorry if the comments here demotivated you from working on Mammoth and enjoying it.

Thank you for the great work so far.

jan-schweiger commented 1 month ago

Mammoth doesn't pay the bills, so I'm afraid I don't spend much time working on it these days.

Sorry, I misunderstood your message.

jan-schweiger commented 1 month ago

I just wanted to help. Sorry :(

mwilliamson commented 1 month ago

No need to apologise, I didn't think your reply was out of place or anything! I just wanted to give a bit of clarity that Mammoth was still being maintained, but I'm unlikely to put vast amounts of time in in the near future.