nissl-lab / npoi

a .NET library that can read/write Office formats without Microsoft Office installed. No COM+, no interop.
Apache License 2.0
5.73k stars 1.43k forks source link

Support adding line endings when a string containing \n is added to the document #1260

Closed Zt-freak closed 1 month ago

Zt-freak commented 9 months ago

NPOI Version Used

2.6.2

File Type

Use Case

I want to be able to add line endings to a document, however every line ending \n in the string I want to add gets converted into a space.

I tested with both XWPFParagraph.ReplaceText() and XWPFRun.SetText(), both of them share this \n to spaces conversion behaviour.

Description

I got an example program which calls both XWPFParagraph.ReplaceText() and XWPFRun.SetText():

using NPOI.XWPF.UserModel;

using FileStream rs = File.OpenRead(@"test.docx");

using var doc = new XWPFDocument(rs);

XWPFParagraph firstPara = doc.Paragraphs.First();

firstPara.ReplaceText("{test}", "A\nB\nC\nD");

XWPFRun run = firstPara.CreateRun();
run.TextPosition = 8;
run.SetText("E\nF\nG\nH");

using var ws = File.Create(@"output.docx");
doc.Write(ws);

The document contains one paragraph containing the text:

{test}

The resulting document will contain the following text:

A B C D E F G H

While the desired result would be:

A
B
C
D
E
F
G
H
Bykiev commented 8 months ago

Hi, you can use AddCarriageReturn() method this way:

string text = "E\nF\nG\nH";
var lines = text.Split("\n");

run.SetText(lines[0]);

for(int i = 1; i < lines.Length; i++)
{
    run.AddCarriageReturn();
    run.AppendText(lines[i]);
}
Zt-freak commented 7 months ago

@Bykiev this solution using AddCarriageReturn appears to work, except within tables.

Bykiev commented 7 months ago

@Zt-freak, you can use this code for tables:

  var table = doc.Tables.First();
  var row = table.GetRow(0);
  var cell = row.GetCell(0);
  foreach (var p in cell.Paragraphs)
  {
      foreach (var r in p.Runs)
      {
          if (!string.IsNullOrWhiteSpace(r.Text))
          {
              var lines2 = r.Text.Split(new string[] { "\\n" }, StringSplitOptions.None);

              r.SetText(lines2[0]);

              for (int i = 1; i < lines2.Length; i++)
              {
                  r.AddCarriageReturn();
                  r.AppendText(lines2[i]);
              }
          }
      }
  }
Zt-freak commented 7 months ago

@Bykiev

@Zt-freak, you can use this code for tables:

  var table = doc.Tables.First();
  var row = table.GetRow(0);
  var cell = row.GetCell(0);
  foreach (var p in cell.Paragraphs)
  {
      foreach (var r in p.Runs)
      {
          if (!string.IsNullOrWhiteSpace(r.Text))
          {
              var lines2 = r.Text.Split(new string[] { "\\n" }, StringSplitOptions.None);

              r.SetText(lines2[0]);

              for (int i = 1; i < lines2.Length; i++)
              {
                  r.AddCarriageReturn();
                  r.AppendText(lines2[i]);
              }
          }
      }
  }

Sadly, this doesn't work on my end, it only removes the "\n"s from the paragraph

Bykiev commented 7 months ago

Attache your code and file please

Zt-freak commented 7 months ago

@Bykiev https://github.com/Zt-freak/BykievNPOI

Bykiev commented 7 months ago

@Zt-freak, the code looks good, but the document contains multiple runs in the paragraph. I've opened the document with WPS Office, selected the second cell and cleared the format, after this there is only 1 run. But with MS Word after clearing the format getting 10 runs. As it's not related to NPOI, the best way will be concat multiple runs in paragraph in one string and then split it with new lines symbol.

tonyqus commented 7 months ago

@Bykiev Please ignore the behavior of WPS Office. It's usually different from Microsoft Word.

pbvs commented 6 months ago

Hi @tonyqus , @Bykiev , @Zt-freak ,

I was looking into this and I was wondering if I could suggest an alternate solution to the problem. Basically our solution works as follows, we define a template in a word document and then merge it with a dataset to get the final document. So we define a set of placeholders in word which the software then replaces with the action value from the data set. This means that our software solution is basically one big search and replace exercise. So for example I in my word document would like: image

The software would then have to replace the text “$replace_text$” and “$replace_cell_text$” with a given text string, in my example I want to replace the tag with the value “Regel1\nRegel2\nRegel3”

I created a small unit test:

image

If i run a unittest npoi generates the following output image

As suggested by @Bykiev we would need to break each part up into a separate run, but turns out to be rather difficult. if i create a word document and add three lines and then look at the xml i see that word has created a run for each line.

image

I checked the openxml documentation (Office Open XML (OOXML) - Word Processing - Text) and an enter can also be encoded in xml as “”. So i did some digging.   When w:t is written to xml this is done in the class “wml.cs” in the function “Write”. What if we replace the “\n” in this function with an element? I made the following modification: image

If run my unit test with this modification i do get the result i am looking for:

image

The xml now looks like for the top part

image

And the table cell looks like:
image

I would like to hear your thoughts on the matter and change i propose.

Bykiev commented 6 months ago

@pbvs, look good, would you like to contribute?

pbvs commented 6 months ago

@Bykiev Sure no problem, I will create the change as described above, do some additional testing and create a pull request when finished.