Line breaks in list of strings

lewisjared commented 3 years ago

Upon migrating to v1.3, list of strings that end up wider than the column width are being split across multiple lines at the column width. This leads to new behaviour where individual values in that list are being spread across lines. Below I've included some serialized output where the DAT_CO2I_INVERSE_EMMS and DAT_HEATCONTENT_AGGREG_TOTAL values are split across lines causing gfortran to read different values from namelists serialized using v1.2.

out_dynamic_vars = 'DAT_SURFACE_TEMP', 'DAT_SURFACE_MIXEDLAYERTEMP',
'DAT_CO2B_EMIS', 'DAT_CO2I_EMIS', 'DAT_CO2T_EMIS', 'DAT_CO2I_INVERSE_EMM,
S', 'DAT_AEROSOL_ERF', 'DAT_TOTAL_INCLVOLCANIC_ERF', 'DAT_TOTAER_DIR_ERF,
', 'DAT_CLOUD_ALBEDO_ERF', 'DAT_CO2_ERF', 'DAT_CH4_ERF', 'DAT_N2O_ERF',
'DAT_CO2_CONC', 'DAT_CH4_CONC', 'DAT_N2O_CONC', 'DAT_HEATCONTENT_AGGREG_,
TOTAL', 'DAT_FGASSUM_ERF', 'DAT_MHALOSUM_ERF'

Is there any way to disable this new behaviour? Perhaps this is related to the change made in #119?

For reference here is the same item serialized using v1.2

  out_dynamic_vars = 'DAT_SURFACE_TEMP', 'DAT_SURFACE_MIXEDLAYERTEMP',
                     'DAT_CO2B_EMIS', 'DAT_CO2I_EMIS', 'DAT_CO2T_EMIS',
                     'DAT_CO2I_INVERSE_EMMS', 'DAT_AEROSOL_ERF', 'DAT_TOTAL_INCLVOLCANIC_ERF',
                     'DAT_TOTAER_DIR_ERF', 'DAT_CLOUD_ALBEDO_ERF', 'DAT_CO2_ERF',
                     'DAT_CH4_ERF', 'DAT_N2O_ERF', 'DAT_CO2_CONC', 'DAT_CH4_CONC',
                     'DAT_N2O_CONC', 'DAT_HEATCONTENT_AGGREG_TOTAL', 'DAT_FGASSUM_ERF',
                     'DAT_MHALOSUM_ERF'

marshallward commented 3 years ago

I see, it's adding extra commas into the strings. For example, DAT_CO2I_INVERSE_EMMS became DAT_CO2I_INVERSE_EMM,S. Also much harder to read. I agree this needs to be fixed.

The string splitting probably does need to remain due to the issue you linked. But there is no consideration of arrays of strings.

I doubt that I will get a chance to work on this immediately, but will try to find some time next week.

marshallward commented 3 years ago

I have pushed a fix (7690f2e4223cde03f34466dc85397639f9927e9a) which no longer inserts the commas into the elements. At the least, this should make the output usable.

There is still the question of appearance, and how to handle arrays of strings. The simplest answer is to leave the string unsplit and move it to the next line if it can fit, and only split when the whole element exceeds the column width.

jacobwilliams commented 3 years ago

We encountered this bug in our group also. Do you think you will be able to push out a new PyPi release soon?

marshallward commented 3 years ago

I'd like to restore the appearance, but if there is urgency then I can push the current fix.

jacobwilliams commented 3 years ago

Maybe a 1.3.1 release if it isn't too much trouble?

f90nml is a major component of one of our NASA tools and it looks like I used a >= for the version number in one of the setup.py files. :)

marshallward commented 3 years ago

I've pushed a fix (7b27456ce0482ac218b8e2a1d288dd0f35f85d35) which has both the comma fix and will move strings to the next line if they can fit, which should resolve both issues.

This is the new output:

&names_nml
    out_dynamic_vars = 'DAT_SURFACE_TEMP', 'DAT_SURFACE_MIXEDLAYERTEMP',
                       'DAT_CO2B_EMIS', 'DAT_CO2I_EMIS', 
                       'DAT_CO2T_EMIS', 'DAT_CO2I_INVERSE_EMMS', 
                       'DAT_AEROSOL_ERF', 'DAT_TOTAL_INCLVOLCANIC_ERF',
                       'DAT_TOTAER_DIR_ERF', 'DAT_CLOUD_ALBEDO_ERF', 
                       'DAT_CO2_ERF', 'DAT_CH4_ERF', 'DAT_N2O_ERF', 
                       'DAT_CO2_CONC', 'DAT_CH4_CONC', 'DAT_N2O_CONC', 
                       'DAT_HEATCONTENT_AGGREG_TOTAL', 
                       'DAT_FGASSUM_ERF', 'DAT_MHALOSUM_ERF'
/

My test says that they are equal.

If the new commit appears to work for everyone, then I will do a 1.3.1 update over the weekend.

jacobwilliams commented 3 years ago

I'm still having a problem with the latest commit (7b27456ce0482ac218b8e2a1d288dd0f35f85d35):

test.in:


 &BLAHBLAHBLAHBLAH
 BLAHBLAHBLAH%BLAH(1)  = 'generic_kernels\spk\satellites\mar033_2000-2025.bsp',
 BLAHBLAHBLAH%BLAH(2)  = 'generic_kernels\spk\satellites\jup204.bsp',
 BLAHBLAHBLAH%BLAH(3)  = 'generic_kernels\spk\satellites\sat143.bsp',
 BLAHBLAHBLAH%BLAH(4)  = 'generic_kernels\spk\satellites\ura064.bsp',
 BLAHBLAHBLAH%BLAH(5)  = 'generic_kernels\spk\satellites\nep016-6.bsp',
 BLAHBLAHBLAH%BLAH(6)  = 'generic_kernels\spk\satellites\plu006-5.bsp',
 BLAHBLAHBLAH%BLAH(7)  = 'generic_kernels\pck\moon_pa_de421_1900-2050.bpc',
 BLAHBLAHBLAH%BLAH(8)  = 'generic_kernels\fk\satellites\moon_080317.tf.pc',
 BLAHBLAHBLAH%BLAH(9)  = 'generic_kernels\lsk\naif0010.tls.pc',
 BLAHBLAHBLAH%BLAH(10) = 'generic_kernels\pck\pck00010.tpc.pc',
 BLAHBLAHBLAH%BLAH(11) = 'generic_kernels\pck\earth_720101_070426.bpc',
 BLAHBLAHBLAH%BLAH(12) = 'generic_kernels\pck\earth_070425_370426_predict.bpc',
 BLAHBLAHBLAH%BLAH(13) = 'generic_kernels\spk\planets\de421.bsp'
 /

n = f90nml.read('test.in')
n.write('test.out')

test.out:

&blahblahblahblah
    blahblahblah%blah(1:13) = 'generic_kernels\spk\satellites\mar033_20
00-2025.bsp', 'generic_kernels\spk\satellites\jup204.bsp', 'generic_ker
nels\spk\satellites\sat143.bsp', 'generic_kernels\spk\satellites\ura064
.bsp', 'generic_kernels\spk\satellites\nep016-6.bsp', 'generic_kernels\
spk\satellites\plu006-5.bsp', 'generic_kernels\pck\moon_pa_de421_1900-2
050.bpc', 'generic_kernels\fk\satellites\moon_080317.tf.pc', 
                              'generic_kernels\lsk\naif0010.tls.pc', 
                              'generic_kernels\pck\pck00010.tpc.pc', 'g
eneric_kernels\pck\earth_720101_070426.bpc', 'generic_kernels\pck\earth
_070425_370426_predict.bpc', 'generic_kernels\spk\planets\de421.bsp'
/

marshallward commented 3 years ago

My test says that the namelists are identical. Is it just the appearance that is an issue?

jacobwilliams commented 3 years ago

I think the splitting up of the strings themselves was causing problems in one of our tools (I would need to verify details on Monday..).

I do like the v1.2 behavior for this and also for aesthetic reasons. Could the new behavior be an option but not the default?

marshallward commented 3 years ago

I think the string splitting needs to stay, or at least be the default behavior. As discovered in the referenced issue, there are compilers that impose a column limit on namelists, so it should be honored. (Ed: It appears to have been a support library, not a compiler, which was imposing a character limit.)*

You could increase the column_width of your Namelist. Or maybe we could introduce column_width = None as a way of defining unlimited line length.

I could also envision some ways to make it less crowded, such as not starting a new string on the same line as a split string, or trying to fit the string on an unindented line.

marshallward commented 3 years ago

Despite my objections above, I've come around and agree it's best to restore the old behavior as default.

I've tested the situation in several versions of three compilers (GCC, Intel, PGI) and see no indication that string splitting is required to handle larger arrays.

It seems that submitter was working under limits imposed by their applications or tooling, not a compiler. So I will introduce a new optional flag to enable string splitting and will let him know.

If anyone wants the gritty details, see below.

I tested this in three compilers: GFortran, Intel, PGI. GFortran and Intel did not appear to impose any limits. I did find a 299-character limit in PGI, but it was not related to split strings.

The limit in PGI seems to be imposed on the string data itself, not the line width of the namelist. Padding with strings or lengthening the variable name seems to have no impact on the issue.

(Oddly, PGI had no problem producing namelists with more than 300 characters, but it wouldn't be the first time a Fortran program was unable to read its own namelists.)

These are the versions I tested:

GFortran: 7.3, 11.1
Intel: 19.0, 19.1
PGI: 19.10, 20.4

marshallward commented 3 years ago

I've pushed a change which restores the old behavior, and introduces a split_strings flag to produce the new behavior.

I don't think this is the last word on string layout. I would still like to have some limiter on as default, and would like to explore more favorable layout rules. But these were unexpected and unintentional changes, so I will leave things as they are for now.

If there are no concerns then I will do a 1.3.1 release soon.

marshallward commented 3 years ago

I've uploaded v1.3.1 to PyPI which restores the old behavior. I will close this issue now. Please reopen if the problem hasn't been resolved.

marshallward / f90nml

Line breaks in list of strings #131