[REVIEW]: MicroFloatingPoints.jl: providing very small IEEE 754-compliant floating-point types

editorialbot commented 3 months ago

Submitting author: !--author-handle-->@goualard-f@danielskatz<!--end-editor-- Reviewers: @matbesancon, @dannys4, @mkitti Archive: 10.5281/zenodo.13777404

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/e7808ee8a0f10324034b222a79cf48c0"><img src="https://joss.theoj.org/papers/e7808ee8a0f10324034b222a79cf48c0/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/e7808ee8a0f10324034b222a79cf48c0/status.svg)](https://joss.theoj.org/papers/e7808ee8a0f10324034b222a79cf48c0)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@matbesancon & @dannys4 & @mkitti, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @danielskatz know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @matbesancon

📝 Checklist for @mkitti

📝 Checklist for @dannys4

editorialbot commented 3 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 3 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/221332.221334 is OK
- 10.1109/ARITH.2013.22 is OK
- 10.1145/2785965 is OK
- 10.1145/3368086 is OK
- 10.1137/19M1251308 is OK
- 10.1007/11787006_1 is OK
- 10.1007/978-3-030-50417-5_2 is OK
- 10.1145/3585515 is OK
- 10.1145/2382196.2382264 is OK
- 10.3389/fninf.2023.1099510 is OK

MISSING DOIs

- No DOI given, and none found for title: tkgunaratne/BFloat.jl

INVALID DOIs

- None

editorialbot commented 3 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.03 s (1385.0 files/s, 149335.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Julia                           29            385            798           1397
SVG                              2              1             28           1100
Markdown                         7            257              0            733
TeX                              1             19              0            147
YAML                             4              3              4             59
TOML                             3              3              0             26
-------------------------------------------------------------------------------
SUM:                            46            668            830           3462
-------------------------------------------------------------------------------

Commit count by author:

    60  Frédéric Goualard
     1  Frédéric GOUALARD

editorialbot commented 3 months ago

Paper file info:

📄 Wordcount for paper.md is 610

✅ The paper includes a Statement of need section

editorialbot commented 3 months ago

License info:

🟡 License found: GNU Lesser General Public License v3.0 (Check here for OSI approval)

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 3 months ago

@matbesancon, @dannys4, and @mkitti - Thanks for agreeing to review this submission. This is the review thread for the paper. All of our communications will happen here from now on.

As you can see above, you each should use the command @editorialbot generate my checklist to create your review checklist. @editorialbot commands need to be the first thing in a new comment.

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#7050 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if either of you require some more time. We can also use editorialbot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@danielskatz) if you have any questions/concerns.

matbesancon commented 3 months ago

Review checklist for @matbesancon

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/goualard-f/MicroFloatingPoints.jl.git?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@goualard-f) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

mkitti commented 3 months ago

Review checklist for @mkitti

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/goualard-f/MicroFloatingPoints.jl.git?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@goualard-f) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

dannys4 commented 3 months ago

Review checklist for @dannys4

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/goualard-f/MicroFloatingPoints.jl.git?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@goualard-f) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

dannys4 commented 3 months ago

In addition to being able to replicate the results presented in the documentation and use these types accordingly, it seems I can easily and successfully employ these types in functions I've written previously without any adjustments (in accordance with the goals of JuliaLang), so I'd consider the functionality phenomenal. While I might recommend automating the tests on the GitHub CI, all tests (which seem very comprehensive) do pass locally.

The documentation on the repo is very thorough and incredibly well written! A few minor points on the "tour":

I think there should be using PyCall; plt=pyimport("matplotlib.pyplot") (or using PyPlot or whatever it is) included at the top for users who are perhaps unfamiliar with the PyCall terminology.
It seems you are missing a plt for imshow in the tour regarding the plot "Exhaustive search for rounded sums in Floatmu{2,2}".
I believe you're missing a using Distributions in the last code box of the tour
While there's developer docs, it seems like you're missing contributor guidelines (per the checklist).

Again, great job, and I wish all developers paid as much attention to detail as you.

The document is largely well-written. The only (excessively) minor comment is that "fractional" needs to be spell checked on the caption of Figure 1.

After these minor points above are addressed, I'd be more than happy to recommend acceptance.

As a final note, in the spirit of JOSS' suggestion we stay objective, this has no bearing on acceptance or rejection; however, I'd urge you to consider using Julia package extensions for future endeavors (example here) to decouple "extra dependencies" (particularly plotting capabilities) from the "core competency" of the package. While the documentation makes a great example of how the plotting can be used as an educational and demonstrative tool, this could very well be an optional tool that can be used on top of the package as opposed to an absolutely necessary "strong" dependency.

danielskatz commented 3 months ago

Thanks @dannys4

danielskatz commented 2 months ago

👋 @matbesancon & @mkitti - how are your reviews coming along?

danielskatz commented 2 months ago

👋 @goualard-f - Do you have any comments on what @dannys4 brought up? And on the issues that @matbesancon opened? The JOSS review is an iterative process, where you don't need to wait for reviewers to complete their initial passes, but can respond to them as they make comments/suggestions.

matbesancon commented 2 months ago

Hi all, The paper is clear in terms of description of the software. The package is still early maturity as far as Julia packages go, hence the issues I opened on the repository itself. Having performance gotchas seems problematic for fundamental types like floating points.

The paper motivates a use case of the package:

MicroFloatingPoints.jl was, for example, instrumental in our understanding of the flaws of the algorithms computing random floating-point numbers by dividing a random integer by a constant (Goualard, 2020): being able to execute the algorithms for all possible inputs and to quickly display the results graphically in Jupyter gave us the impetus to demonstrate rigorously that such procedures cannot ensure an even distribution of the bits in the fractional parts of the random floats, rendering them useless for applications such as differential privacy

It is not clear to me what the point of the use of Jupyter is here, is it some usability aspects of the library that are highlighted? What changes compared to other floating point types and libraries?

mkitti commented 2 months ago

The authors provide a package for small IEEE-754 compliant floating point types. This is implemented via parameterized Floatmu{szE,szf} type where szE are the number of bits representing the exponent and szf are the number of bits representing the fractional part. The total number of bits used is 1 + szE + szf.

Floatmu stores this information in two 32-bit integers as mentioned in a footnote in the documentation. This is somewhat perplexing since often the goal of creating such minifloats is to minimize the storage needs of the numbers as to store more values in memory. The first of the two 32-bit integers stores the bits of the floating point value itself. The second stores an inexact flag which can take values -1, 0, or 1. The type of the first value could be parameterized while the second value could be made smaller to use a type such as a Int8.

That is rather the current definition of the Floatmu type as follows.

struct Floatmu{szE, szf} <: AbstractFloat
    v::UInt32
    inexact::Int32
end

The type could be made more flexible in the following manner:

struct Floatmu{szE, szf, T} <: AbstractFloat
    v::T
    inexact::Int8
end

The proposed Floatmu{1, 6, UInt8} would only require two bytes rather than eight bytes in the current implementation.

Additionally a global 32-bit inexact_flag is used. This global state creates a thread safety issue should these numerical types be used in a multi-threaded environment. Perhaps one way this could be addressed is by using task local storage.

The paper and documentation do not mention the Julia package BFloat16s.jl although the author does reference BFloat. The author should reference this package and explain the relationship of this work to that prior work as part of explaining the State of the Field.

The overall paper is quite short with a single figure showing a flaw discovered by using the tools within the paper. In contrast the documentation is extensive with multiple figures and a guided tour. The paper could be extended to include a more thorough description of the package and perhaps include some elements from the documentation and guided tour.

There are several software engineering issues that the other reviewers have pointed out. I have submitted some pull requests to the package to try to alleviate some of these issues. The issues include

Unused dependencies
Confusion in the structure of the package and documentation environments
Inclusion of Python and matplotlib as strong rather than weak dependencies

I am still working through the numerical analysis and my review is ongoing.

Overall, the package seems mainly situated as a package meant to analyze the properties of small floating point types rather than providing practical minifloats for use in computational algorithms. Practical use of the package could be improved by addressing the software engineering issues, the global inexact flag, and the size of the Floatmu type. While the work could be evaluated and published purely as an analysis package, I highly recommend that the authors consider some changes to improve the practical use of the package.

danielskatz commented 2 months ago

thanks @mkitti for your thoughtful and detailed comments.

goualard-f commented 2 months ago

@dannys4

I think there should be using PyCall; plt=pyimport("matplotlib.pyplot") (or using PyPlot or whatever it is) included at the top for users who are perhaps unfamiliar with the PyCall terminology.

Added some text at the beginning of Section Graphics with MicroFloatingPoints.MFPPlot to remind the user to add using PyPlot when necessary.

It seems you are missing a plt for imshow in the tour regarding the plot "Exhaustive search for rounded sums in Floatmu{2,2}".

Corrected, even though it did not seem to hinder proper compilation.

I believe you're missing a using Distributions in the last code box of the tour

Corrected (it was performed under the hood).

While there's developer docs, it seems like you're missing contributor guidelines (per the checklist).

There is a section called Contributing and reporting issues at the beginning of the documentation but it does not say much, I am afraid, apart from requiring pull requests to contribute.

The document is largely well-written. The only (excessively) minor comment is that "fractional" needs to be spell checked on the caption of Figure 1.

Corrected.

As a final note, in the spirit of JOSS' suggestion we stay objective, this has no bearing on acceptance or rejection; however, I'd urge you to consider using Julia package extensions for future endeavors (example here) to decouple "extra dependencies" (particularly plotting capabilities) from the "core competency" of the package. While the documentation makes a great example of how the plotting can be used as an educational and demonstrative tool, this could very well be an optional tool that can be used on top of the package as opposed to an absolutely necessary "strong" dependency.

I was not aware of the existence of package extensions. Thanks for pointed them out. I will be sure to investigate them.

Thank you very much for your detailed remarks. I have committed the modifications to the joss2024 branch.

goualard-f commented 2 months ago

@matbesancon

The paper is clear in terms of description of the software. The package is still early maturity as far as Julia packages go, hence the issues I opened on the repository itself. Having performance gotchas seems problematic for fundamental types like floating points.

Two of the issues have been closed, thanks to @mkitti contributions.

I have removed the global variable containing the inexact flag and replaced it by functions with closures. I have also added some text at the beginning of the documentation to make it clear to the prospective user that the package is not about performances but about the versatility of easily defining new floating-point types with different precisions and ranges, something that is not offered by packages such as BFloat16s.jl and other packages with a fixed format.

The paper motivates a use case of the package:

MicroFloatingPoints.jl was, for example, instrumental in our understanding of the flaws of the algorithms computing random floating-point numbers by dividing a random integer by a constant (Goualard, 2020): being able to execute the algorithms for all possible inputs and to quickly display the results graphically in Jupyter gave us the impetus to demonstrate rigorously that such procedures cannot ensure an even distribution of the bits in the fractional parts of the random floats, rendering them useless for applications such as differential privacy

It is not clear to me what the point of the use of Jupyter is here, is it some usability aspects of the library that are highlighted? What changes compared to other floating point types and libraries?

I have modified the wording in the paper to make it clearer that Jupyter only brings to the table an integrated environment in which it is easier to test many snippets of code and see the results, thanks to the scripting nature of Julia.

Thank you for your remarks that help make this package better!

goualard-f commented 2 months ago

@mkitti First, thank you very much for your contribution to this package. For the time being, I have merged your PRs into the joss2024 branch. Everything should be ultimately merged to the main branch.

Floatmu stores this information in two 32-bit integers as mentioned in a footnote in the documentation. This is somewhat perplexing since often the goal of creating such minifloats is to minimize the storage needs of the numbers as to store more values in memory. The first of the two 32-bit integers stores the bits of the floating point value itself. The second stores an inexact flag which can take values -1, 0, or 1. The type of the first value could be parameterized while the second value could be made smaller to use a type such as a Int8.

That is rather the current definition of the Floatmu type as follows.
struct Floatmu{szE, szf} <: AbstractFloat
    v::UInt32
    inexact::Int32
end
The type could be made more flexible in the following manner:
struct Floatmu{szE, szf, T} <: AbstractFloat
    v::T
    inexact::Int8
end

As you state later on yourself, the use case for this package is not the same as, e.g., the BFloat16s.jl package you refer to. The MicroFloatingPoints.jl package is often used for exploratory purpose in algorithms, not necessarily machine learning ones, for which it is not clear what is the impact of precision and range. Performances are traded for versatility: to my knowledge, no other package offers the possibility to choose both the precision and the range (e.g., BFloat16s.jl is stuck with the characteristics of Google Brain's 16 bit floats).

I do not understand how you would use the third parameter T in the definition of Floatmu. My understanding is that it would make the code of the package much harder to write as the algorithms used currently know in advance the size of the integer used to represent a minifloat.

I do not have enough knowledge of the internal representation adopted by Julia to represent a Floatmu to assess the impact of using an Int8 for inexact instead of an Int32. Wouldn't it be a problem alignment-wise?

The proposed Floatmu{1, 6, UInt8} would only require two bytes rather than eight bytes in the current implementation.

Additionally a global 32-bit inexact_flag is used. This global state creates a thread safety issue should these numerical types be used in a multi-threaded environment. Perhaps one way this could be addressed is by using task local storage.

The global variable has been removed and replaced by functions with closure.

The paper and documentation do not mention the Julia package BFloat16s.jl although the author does reference BFloat. The author should reference this package and explain the relationship of this work to that prior work as part of explaining the State of the Field.

I have added a reference to the package in the paper and made it clear that both packages do not have the same use-case.

The overall paper is quite short with a single figure showing a flaw discovered by using the tools within the paper. In contrast the documentation is extensive with multiple figures and a guided tour. The paper could be extended to include a more thorough description of the package and perhaps include some elements from the documentation and guided tour.

My understanding was that the size of the paper should be kept at a minimum since the documentation is the primary source of information. Maybe the editor (@danielskatz) can chip in on that point?

There are several software engineering issues that the other reviewers have pointed out. I have submitted some pull requests to the package to try to alleviate some of these issues. The issues include
* Unused dependencies

Should be solved by your PR

* Confusion in the structure of the package and documentation environments

I am not sure I understand what you are refering to here.

* Inclusion of Python and matplotlib as strong rather than weak dependencies

The package is currently supposed to offer graphical functionnalities as first class citizens, which, I believe makes PyPlot a strong dependency. @dannys4 suggested to use package extensions, which I plan to investigate in the near future.

I am still working through the numerical analysis and my review is ongoing.

Overall, the package seems mainly situated as a package meant to analyze the properties of small floating point types rather than providing practical minifloats for use in computational algorithms. Practical use of the package could be improved by addressing the software engineering issues, the global inexact flag, and the size of the Floatmu type. While the work could be evaluated and published purely as an analysis package, I highly recommend that the authors consider some changes to improve the practical use of the package.

mkitti commented 2 months ago

Thank you for your reponse to my comments. Thus far, I have only posted my initial impression and critiques. I have not yet suggested specific actions. I will break my notes and responses here into several comments.

As you state later on yourself, the use case for this package is not the same as, e.g., the BFloat16s.jl package you refer to. The MicroFloatingPoints.jl package is often used for exploratory purpose in algorithms, not necessarily machine learning ones, for which it is not clear what is the impact of precision and range. Performances are traded for versatility: to my knowledge, no other package offers the possibility to choose both the precision and the range (e.g., BFloat16s.jl is stuck with the characteristics of Google Brain's 16 bit floats).

I think it would be fine to consider this primarily as an analysis package, but I would appreciate if this was better documented upfront.

Another that comes to mind is https://github.com/JuliaMath/FixedPointNumbers.jl, which has an underlying type parameter as well as a parameter to control how many bits are committed to the fractional portion of a fixed point number.

I do not understand how you would use the third parameter T in the definition of Floatmu. My understanding is that it would make the code of the package much harder to write as the algorithms used currently know in advance the size of the integer used to represent a minifloat.

Julia type parameterization system allows one to relatively easily generate generic code. For example, instead of hard coding UInt32 below...

# Mask to retrieve the fractional part (internal use)
significand_mask(::Type{Floatmu{szE,szf}}) where {szE, szf}  = UInt32((UInt32(1) << szf) - 1)
# Mask to retrieve the exponent part (internal use)
exponent_mask(::Type{Floatmu{szE,szf}}) where {szE, szf} = UInt32((UInt32(1) << UInt32(szE))-1) << UInt32(szf)
# Mask to retrieve the sign part (internal use)
sign_mask(::Type{Floatmu{szE,szf}}) where {szE, szf} = UInt32(1) << (UInt32(szE)+UInt32(szf))

... we could replace UInt32 by the parameter T.

# Mask to retrieve the fractional part (internal use)
significand_mask(::Type{Floatmu{szE,szf,T}}) where {szE, szf,T}  = T((T(1) << szf) - 1)
# Mask to retrieve the exponent part (internal use)
exponent_mask(::Type{Floatmu{szE,szf,T}}) where {szE, szf,T} = T((T(1) << T(szE))-1) << T(szf)
# Mask to retrieve the sign part (internal use)
sign_mask(::Type{Floatmu{szE,szf,T}}) where {szE, szf,T} = T(1) << (T(szE)+T(szf))

If you wanted to how many bits a concrete had for storage, you could use a function like storage_bits below.

julia> storage_bits(::Type{<: Floatmu{<: Any, <: Any,T}}) where T = sizeof(T)*8
storage_bits (generic function with 1 method)

julia> storage_bits(Floatmu{5,10,UInt32})
32

I would also like to point out that the functions above all compile down to methods that just return constant values.

julia> @code_llvm significand_mask(Floatmu{5,10,UInt32})
;  @ REPL[7]:1 within `significand_mask`
define i32 @julia_significand_mask_180() #0 {
top:
  ret i32 1023
}

If you have a function that maybe strongly depends on the storage type being UInt32 you could just implement the method for that parameter. A transitional approach might be to do the following.

struct FloatmuGeneric{szE, szf, T} <: AbstractFloat
    v::T
    inexact::Int8
    # constructors ...
end

const Floatmu{szE, szf} = FloatmuGeneric{szE, szf, UInt32}

Now all of your existing code is properly scoped to the storage type being UInt32, but you could generalizing the code to arbitrary unsigned integer type T where it would be easy to do so.

I do not have enough knowledge of the internal representation adopted by Julia to represent a Floatmu to assess the impact of using an Int8 for inexact instead of an Int32. Wouldn't it be a problem alignment-wise?

Julia structs follow the C convention. You are correct that keeping v as a UInt32 and inexact as a Int8 would not save any space due to padding. However, the size could be smaller for smaller storage types.

julia> sizeof(Floatmu{8,14,UInt32})
8

julia> sizeof(Floatmu{8,14,UInt16})
4

julia> sizeof(Floatmu{8,14,UInt8})
2

I do not see any code where alignment would be a major issue. Please correct me if I am wrong.

goualard-f commented 2 months ago

@mkitti

I think it would be fine to consider this primarily as an analysis package, but I would appreciate if this was better documented upfront.

I have modified the introduction of the online documentation to make it clearer and to stress the fact that all formats are represented as a pair of 32 bit integers (see also below). I have also added some text to that effect at the end of the paper (all modifications pushed to the joss2024 branch).

Another that comes to mind is https://github.com/JuliaMath/FixedPointNumbers.jl, which has an underlying type parameter as well as a parameter to control how many bits are committed to the fractional portion of a fixed point number.

I did not know this package. However, it believe it is less relevant than, say, the BFloat16s.jl package you mentioned earlier as fixed-point numbers have an entirely different error profile (the error being of a fixed size throughout the range).

I do not understand how you would use the third parameter T in the definition of Floatmu. My understanding is that it would make the code of the package much harder to write as the algorithms used currently know in advance the size of the integer used to represent a minifloat.

Julia type parameterization system allows one to relatively easily generate generic code. For example, instead of hard coding UInt32 below...
# Mask to retrieve the fractional part (internal use)
significand_mask(::Type{Floatmu{szE,szf}}) where {szE, szf}  = UInt32((UInt32(1) << szf) - 1)
# Mask to retrieve the exponent part (internal use)
exponent_mask(::Type{Floatmu{szE,szf}}) where {szE, szf} = UInt32((UInt32(1) << UInt32(szE))-1) << UInt32(szf)
# Mask to retrieve the sign part (internal use)
sign_mask(::Type{Floatmu{szE,szf}}) where {szE, szf} = UInt32(1) << (UInt32(szE)+UInt32(szf))
... we could replace UInt32 by the parameter T.
# Mask to retrieve the fractional part (internal use)
significand_mask(::Type{Floatmu{szE,szf,T}}) where {szE, szf,T}  = T((T(1) << szf) - 1)
# Mask to retrieve the exponent part (internal use)
exponent_mask(::Type{Floatmu{szE,szf,T}}) where {szE, szf,T} = T((T(1) << T(szE))-1) << T(szf)
# Mask to retrieve the sign part (internal use)
sign_mask(::Type{Floatmu{szE,szf,T}}) where {szE, szf,T} = T(1) << (T(szE)+T(szf))
If you wanted to how many bits a concrete had for storage, you could use a function like storage_bits below.
julia> storage_bits(::Type{<: Floatmu{<: Any, <: Any,T}}) where T = sizeof(T)*8
storage_bits (generic function with 1 method)

julia> storage_bits(Floatmu{5,10,UInt32})
32
I would also like to point out that the functions above all compile down to methods that just return constant values.
julia> @code_llvm significand_mask(Floatmu{5,10,UInt32})
;  @ REPL[7]:1 within `significand_mask`
define i32 @julia_significand_mask_180() #0 {
top:
  ret i32 1023
}
If you have a function that maybe strongly depends on the storage type being UInt32 you could just implement the method for that parameter. A transitional approach might be to do the following.
struct FloatmuGeneric{szE, szf, T} <: AbstractFloat
    v::T
    inexact::Int8
    # constructors ...
end

const Floatmu{szE, szf} = FloatmuGeneric{szE, szf, UInt32}
Now all of your existing code is properly scoped to the storage type being UInt32, but you could generalizing the code to arbitrary unsigned integer type T where it would be easy to do so.

That is an interesting generalization as it could potentially allow to compute with floating-point formats with a bit more precision and range than Float32, which is presently the limit of the package. With Float64 as the support for computation ---as it is now--- we could have floats with a 26 bit precision instead of only 23 at most. In the other direction, we could divide by 2 or 4 the size of the representation of the types when they are very small. From my (albeit limited) interaction with users, I am not yet convinced that this is a requested feature, however since the package is primarily used for preliminary investigations with a limited amount of data.

danielskatz commented 2 months ago

Thanks to everyone for this nice discussion so far

mkitti commented 2 months ago

I have created https://github.com/goualard-f/MicroFloatingPoints.jl/pull/11 that turns PyPlot into a weak dependency via the package extension mechanism.

This also allows the more modern PythonPlot.jl as a weak dependency to be used which depends on PythonCall.jl and micromamba on a per-project basis.

Overall, the dependencies of the core package are reduced those Random and Printf which are both in the standard library. Via the extension mechansim, additional plotting backends could be implemented. This could include Plots.jl or Makie.jl based plots.

danielskatz commented 1 month ago

👋 @goualard-f - Any thoughts on @mkitti's suggestions above?

danielskatz commented 1 month ago

@goualard-f - I also notice that the only item @dannys4 hasn't checked of is related to community guidelines. Can you comment on how your software repository handles this? Or improve it?

goualard-f commented 1 month ago

Dear @mkitti

I have created goualard-f/MicroFloatingPoints.jl#11 that turns PyPlot into a weak dependency via the package extension mechanism.

This also allows the more modern PythonPlot.jl as a weak dependency to be used which depends on PythonCall.jl and micromamba on a per-project basis.

Overall, the dependencies of the core package are reduced those Random and Printf which are both in the standard library. Via the extension mechansim, additional plotting backends could be implemented. This could include Plots.jl or Makie.jl based plots.

Wow! That is a really nice contribution! Than you again for your work. I have merged your PR and added a bit of documentation about the new features (in addition to the one you had already provided).

goualard-f commented 1 month ago

@danielskatz

👋 @goualard-f - Any thoughts on @mkitti's suggestions above?

I have taken some time to review @mkitti's contribution. Once again, he did a great job that enhances significatively the modularity of the package. I have merged his PR into the joss2024 branch.

goualard-f commented 1 month ago

@danielskatz

@goualard-f - I also notice that the only item @dannys4 hasn't checked of is related to community guidelines. Can you comment on how your software repository handles this? Or improve it?

I have added aCONTRIBUTING.md file that makes clear what is considered a good contribution to the package and how to report issues.

matbesancon commented 1 month ago

@whedon generate pdf

editorialbot commented 1 month ago

My name is now @editorialbot

matbesancon commented 1 month ago

@editorialbot generate pdf

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

dannys4 commented 1 month ago

I see the contribution in that branch, and I believe this satisfies the objective of the checklist item

danielskatz commented 1 month ago

👋 @goualard-f and @dannys4 - based on the comment above, it seems like that branch could be merged into the main branch now? Then the item would be complete for JOSS.

danielskatz commented 1 month ago

👋 @matbesancon & @mkitti - I see you both just have a few items left to check off. What else is needed from the authors at this point?

mkitti commented 1 month ago

@danielskatz , could you respond to the following?

My understanding was that the size of the paper should be kept at a minimum since the documentation is the primary source of information. Maybe the editor (@danielskatz) can chip in on that point?

My contention is that while the documentation does a good job of introducing the package, the paper does not. In particular, these two webpages and their figures provide a great impression of what this package does.

My recent experience with JOSS suggests that the paper could carry more content such that the paper itself carries a full introduction and a "guided tour" of the capabilities of the package:

In particular, the above papers which I reviewed recently include:

Explanations of their mathematical basis if relevant.
Example of code of how to use the package.
Figures explaining the concepts and generated by the above code.

My opinion is that while the the package and its documentation will serve as a living ongoing project, the paper itself should be a complete description of the package at this point in time. As the author already has the materials available, I think is quite relevant and plausible to ask that these elements also be included in the paper.

For potential users of this package, the paper may be the only description of the software that they have before they try to evaluate if the next project may be more relevant to their needs than this one. Thus, I think the paper should be complete and not reliant on the documentation to give a full impression of the package in totality.

mkitti commented 1 month ago

The one technical point I am still working through is the numerical analysis portion. I will finish that this week.

danielskatz commented 1 month ago

@editorialbot check repository

I'm running this to check on the paper's word count

editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.04 s (1477.9 files/s, 163729.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Julia                           32            403            819           1452
SVG                              2              1             28           1114
Markdown                         8            278              0            765
TOML                             4            115              1            473
TeX                              1             19              0            147
YAML                             5              1              7            138
-------------------------------------------------------------------------------
SUM:                            52            817            855           4089
-------------------------------------------------------------------------------

Commit count by author:

    69  Frédéric Goualard
     9  Mark Kittisopikul
     1  Frédéric GOUALARD

editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 681

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

🟡 License found: GNU Lesser General Public License v3.0 (Check here for OSI approval)

danielskatz commented 1 month ago

My recent experience with JOSS suggests that the paper could carry more content such that the paper itself carries a full introduction and a "guided tour" of the capabilities of the package:

[...]

In particular, the above papers which I reviewed recently include:

Explanations of their mathematical basis if relevant.

Example of code of how to use the package.

Figures explaining the concepts and generated by the above code.

My opinion is that while the the package and its documentation will serve as a living ongoing project, the paper itself should be a complete description of the package at this point in time. As the author already has the materials available, I think is quite relevant and plausible to ask that these elements also be included in the paper.

For potential users of this package, the paper may be the only description of the software that they have before they try to evaluate if the next project may be more relevant to their needs than this one. Thus, I think the paper should be complete and not reliant on the documentation to give a full impression of the package in totality.

JOSS suggests that paper should be between 250 and 1000 words. This paper is currently 681, so there is room for more content.

On the other hand, JOSS papers typically are still somewhat brief, and not complete tutorials on how to use the software, including complex examples. JOSS papers typically give a high-level summary of the paper, and point to the documentation or repository for details.

So I think it's reasonable for the author to try to improve the paper as requested, with the constraint of the JOSS word limit.

goualard-f commented 1 month ago

@editorialbot check repository

editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.04 s (1401.8 files/s, 172102.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
SVG                              3              1             47           1782
Julia                           32            403            819           1447
Markdown                         8            287              0            804
TOML                             4            115              1            473
TeX                              1             21              0            161
YAML                             5              1              7            138
-------------------------------------------------------------------------------
SUM:                            53            828            874           4805
-------------------------------------------------------------------------------

Commit count by author:

    70  Frédéric Goualard
     9  Mark Kittisopikul
     1  Frédéric GOUALARD

editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 981

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

🟡 License found: GNU Lesser General Public License v3.0 (Check here for OSI approval)

goualard-f commented 1 month ago

@danielskatz @matbesancon @mkitti @dannys4 I have modified the article to include a quick tour; I have also added a section to make clearer the limitations of the package.

matbesancon commented 1 month ago

@editorialbot generate pdf

editorialbot commented 1 month ago

:warning: An error happened when generating the pdf.

openjournals / joss-reviews