In 9: “...massive amount of visual and audio input.” I believe the adjective of “audio” is “aural”, or the noun of “visual” is “video”. I think you should use either nouns or adjectives instead of mixing.
In 9: “We’ll also see attention in graph neural networks and they are common in computer vision.” -> “...neural networks that are common in computer vision.”
In 9: “it’s rank” -> “its rank”. I overall find the whole sentence confusing “Note that often the query is batched, so that it’s rank will be 2 if batched and the output’s rank will be 2.” When will the output’s rank be 2? Also when batched?
Table in 9: “All words in sentence represented as matrix of…” -> “...in a sentence represented as a matrix of…”
In 9.1: “The values could be identical to the keys, which is common.” -> “It is common that the values are identical to the keys.”
In 9.1: “3 dimensional” -> “3-dimensional”
In 9.1: “positive word (happy) or a negative word (angry)” -> “positive word (“happy”) or a negative word (“angry”)”
In 9.2: “Usually this is achieved...” -> “Usually, this is…”
In 9.8: “Typically we apply multiple sequential blocks of attention, so need the values input to the next block to be of rank 2 again instead of the (H,L,V) tensor.” - > “Typically, we apply multiple sequential blocks of attention, and thus need the input values to the next block to be of rank 2 instead of the rank(?) of the (H,L,V) tensor.”?
In 9: “...massive amount of visual and audio input.” I believe the adjective of “audio” is “aural”, or the noun of “visual” is “video”. I think you should use either nouns or adjectives instead of mixing.
In 9: “We’ll also see attention in graph neural networks and they are common in computer vision.” -> “...neural networks that are common in computer vision.”
In 9: “it’s rank” -> “its rank”. I overall find the whole sentence confusing “Note that often the query is batched, so that it’s rank will be 2 if batched and the output’s rank will be 2.” When will the output’s rank be 2? Also when batched?
Table in 9: “All words in sentence represented as matrix of…” -> “...in a sentence represented as a matrix of…”
In 9.1: “The values could be identical to the keys, which is common.” -> “It is common that the values are identical to the keys.”
In 9.1: “3 dimensional” -> “3-dimensional”
In 9.1: “positive word (happy) or a negative word (angry)” -> “positive word (“happy”) or a negative word (“angry”)”
In 9.2: “Usually this is achieved...” -> “Usually, this is…”
In 9.8: “Typically we apply multiple sequential blocks of attention, so need the values input to the next block to be of rank 2 again instead of the (H,L,V) tensor.” - > “Typically, we apply multiple sequential blocks of attention, and thus need the input values to the next block to be of rank 2 instead of the rank(?) of the (H,L,V) tensor.”?